What technology area does this patent fall under?

Primary CPC classification G06F9/3013. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Prefetch store preallocation in an effective address-based cache directory

US11520585B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11520585-B2
Application number	US-202117220115-A
Country	US
Kind code	B2
Filing date	Apr 1, 2021
Priority date	May 4, 2020
Publication date	Dec 6, 2022
Grant date	Dec 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In at least one embodiment, a processing unit includes a processor core and a vertical cache hierarchy including at least a store-through upper-level cache and a store-in lower-level cache. The upper-level cache includes a data array and an effective address (EA) directory. The processor core includes an execution unit, an address translation unit, and a prefetch unit configured to initiate allocation of a directory entry in the EA directory for a store target EA without prefetching a cache line of data into the corresponding data entry in the data array. The processor core caches in the directory entry an EA-to-RA address translation information for the store target EA, such that a subsequent demand store access that hits in the directory entry can avoid a performance penalty associated with address translation by the translation unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of data processing in a processing unit, the method comprising: prefetching operand data likely to be accessed by a processor core of the processing unit through the execution of demand memory access instructions into a vertical cache hierarchy including at least a set-associative store-through upper-level data cache and a store-in lower-level cache, wherein the set-associative upper-level cache includes a set-associative data array and a set-associative effective address (EA) directory having a plurality of directory entries each corresponding to a respective data entry among a plurality of data entries in the data array; processing, in an execution of the processor core, memory access instructions and, based on processing the memory access instructions, initiating accesses to the vertical cache hierarchy; initiating a store prefetch stream, and based on a prefetch miss of store target EA of the store prefetch stream in the set-associative EA directory, allocating a directory entry in the set-associative EA directory for the store target EA without prefetching an associated cache line of operand data identified by the store target EA into the corresponding data entry in the data array; and translating the store target EA into real address (RA) and caching in the directory entry EA-to-RA address translation information for the store target EA, such that a subsequent demand store access that hits in the directory entry can avoid a performance penalty associated with address translation. 2. The method of claim 1 , and further comprising prefetching data associated with the store target effective EA into the lower-level cache. 3. The method of claim 1 , wherein: the processor core includes a real address (RA) directory of the set-associative upper-level data cache; and the EA-to-RA address translation information includes a pointer to a directory entry in the RA directory buffering an RA corresponding to the store target EA. 4. The method of claim 1 , and further comprising: allocating a queue entry among a plurality of queue entries in a prefetch queue (PRQ) to the store prefetch stream including the store target EA; and indicating in the queue entry a direction and stride for the store prefetch stream. 5. The method of claim 4 , and further comprising indicating in the queue entry that prefetching of operand data for the prefetch store stream into the upper-level cache is inhibited. 6. The method of claim 1 , wherein: the store target EA is a first store target EA; and based on a hit of a second store target EA of a demand store access in the directory entry in the EA directory, utilizing the cached EA-to-RA address translation information to obtain the RA without translation of the second store target EA by the translation unit. 7. A processing unit, comprising: a vertical cache hierarchy including at least a store-through set-associative upper-level data cache and a store-in lower-level cache, wherein the set-associative upper-level data cache includes a set-associative data array and a set-associative effective address (EA) directory having a plurality of directory entries each corresponding to a respective data entry among a plurality of data entries in the data array; a processor core including: an execution unit configured to process memory access instructions and, based on processing the memory access instructions, initiate accesses to the vertical cache hierarchy; a translation unit configured to translate EAs to real addresses (RAs); an operand data prefetch unit that prefetches, into the vertical cache hierarchy, operand data likely to be accessed by the processor core through execution of demand memory access instructions by the execution unit, wherein the operand data prefetch unit is configured, based on a prefetch miss in the set-associative EA directory for a store target EA, to initiate allocation of a directory entry in the set-associative EA directory for the store target EA without prefetching an associated cache line of operand data identified by the store target EA into the corresponding data entry in the data array; and wherein the processor core caches in the directory entry EA-to-RA address translation information for the store target EA, such that a subsequent demand store access that hits in the directory entry can avoid a performance penalty associated with address translation by the translation unit. 8. The processor of claim 7 , wherein the operand data prefetch unit is configured to prefetch operand data associated with the store target effective EA into the lower-level cache. 9. The processor of claim 7 , wherein: the processor core includes a real address (RA) directory of the set-associative upper-level data cache; and the EA-to-RA address translation information includes a pointer to a directory entry in the RA directory buffering an RA corresponding to the store target EA. 10. The processor of claim 7 , wherein: the operand data prefetch unit includes a prefetch queue (PRQ) including a plurality of queue entries; the operand data prefetch unit allocates a queue entry among the plurality of queue entries to a store prefetch stream including the store target EA; and the queue entry indicates a direction and stride for the store prefetch stream. 11. The processor of claim 10 , wherein the queue entry further indicates that prefetching of operand data for the prefetch store stream into the upper-level cache is inhibited. 12. The processor of claim 7 , wherein: the store target EA is a first store target EA; and the processor core, based on a hit of a second store target EA of a demand store access in the directory entry in the EA directory, utilizes the cached EA-to-RA address translation information to obtain the RA without translation of the second store target EA by the translation unit. 13. A data processing system, comprising: multiple processing units, including the processing unit of claim 7 ; a shared memory; and a system interconnect communicatively coupling the shared memory and the multiple processing units. 14. A design structure tangibly embodied in a machine-readable storage device for designing, manufacturing, or testing an integrated circuit, the design structure comprising: a processing unit, including: a vertical cache hierarchy including at least a set-associative store-through upper-level data cache and a store-in lower-level cache, wherein the set-associative upper-level data cache includes a set-associative data array and a set-associative effective address (EA) directory having a plurality of directory entries each corresponding to a respective data entry among a plurality of data entries in the data array; a processor core including: an execution unit configured to process memory access instructions and, based on processing the memory access instructions, initiate accesses to the vertical cache hierarchy; a translation unit configured to translate EAs to real addresses (RAs); an operand data prefetch unit that prefetches, into the vertical cache hierarchy, operand data likely to be accessed by the processor core through execution of demand memory access instructions by the execution unit, wherein the operand data prefetch unit is configured, based on a prefetch miss in the set-associative EA directory for a store target EA, to initiate allocation of a directory entry in the set-associative EA directory for the store target EA without prefetching an associated cache line of operand data identified by the store target EA into the corresponding data entry in the data array; and wherein the processor core caches in t

Assignees

Inventors

Classifications

G06F12/0862
with prefetch · CPC title
G06F9/3013Primary
according to data content, e.g. floating-point registers, address registers · CPC title
G06F9/3824
Operand accessing · CPC title
G06F9/30032
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
G06F9/3012
Organisation of register space, e.g. banked or distributed register file · CPC title

Patent family

Related publications grouped by family.

View patent family 78292852

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11520585B2 cover?: In at least one embodiment, a processing unit includes a processor core and a vertical cache hierarchy including at least a store-through upper-level cache and a store-in lower-level cache. The upper-level cache includes a data array and an effective address (EA) directory. The processor core includes an execution unit, an address translation unit, and a prefetch unit configured to initiate all…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F9/3013. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems, apparatuses, and methods for chained fused multiply add

Managing memory access requests with prefetch for streams

Local processing apparatus and data transceiving method thereof

Multi-petascale highly efficient parallel supercomputer

Prefetching across page boundaries in hierarchically cached processors

Frequently asked questions