Systems, apparatuses, and methods for chained fused multiply add
US-10146535-B2 · Dec 4, 2018 · US
US11520585B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11520585-B2 |
| Application number | US-202117220115-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 1, 2021 |
| Priority date | May 4, 2020 |
| Publication date | Dec 6, 2022 |
| Grant date | Dec 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In at least one embodiment, a processing unit includes a processor core and a vertical cache hierarchy including at least a store-through upper-level cache and a store-in lower-level cache. The upper-level cache includes a data array and an effective address (EA) directory. The processor core includes an execution unit, an address translation unit, and a prefetch unit configured to initiate allocation of a directory entry in the EA directory for a store target EA without prefetching a cache line of data into the corresponding data entry in the data array. The processor core caches in the directory entry an EA-to-RA address translation information for the store target EA, such that a subsequent demand store access that hits in the directory entry can avoid a performance penalty associated with address translation by the translation unit.
Opening claim text (preview).
What is claimed is: 1. A method of data processing in a processing unit, the method comprising: prefetching operand data likely to be accessed by a processor core of the processing unit through the execution of demand memory access instructions into a vertical cache hierarchy including at least a set-associative store-through upper-level data cache and a store-in lower-level cache, wherein the set-associative upper-level cache includes a set-associative data array and a set-associative effective address (EA) directory having a plurality of directory entries each corresponding to a respective data entry among a plurality of data entries in the data array; processing, in an execution of the processor core, memory access instructions and, based on processing the memory access instructions, initiating accesses to the vertical cache hierarchy; initiating a store prefetch stream, and based on a prefetch miss of store target EA of the store prefetch stream in the set-associative EA directory, allocating a directory entry in the set-associative EA directory for the store target EA without prefetching an associated cache line of operand data identified by the store target EA into the corresponding data entry in the data array; and translating the store target EA into real address (RA) and caching in the directory entry EA-to-RA address translation information for the store target EA, such that a subsequent demand store access that hits in the directory entry can avoid a performance penalty associated with address translation. 2. The method of claim 1 , and further comprising prefetching data associated with the store target effective EA into the lower-level cache. 3. The method of claim 1 , wherein: the processor core includes a real address (RA) directory of the set-associative upper-level data cache; and the EA-to-RA address translation information includes a pointer to a directory entry in the RA directory buffering an RA corresponding to the store target EA. 4. The method of claim 1 , and further comprising: allocating a queue entry among a plurality of queue entries in a prefetch queue (PRQ) to the store prefetch stream including the store target EA; and indicating in the queue entry a direction and stride for the store prefetch stream. 5. The method of claim 4 , and further comprising indicating in the queue entry that prefetching of operand data for the prefetch store stream into the upper-level cache is inhibited. 6. The method of claim 1 , wherein: the store target EA is a first store target EA; and based on a hit of a second store target EA of a demand store access in the directory entry in the EA directory, utilizing the cached EA-to-RA address translation information to obtain the RA without translation of the second store target EA by the translation unit. 7. A processing unit, comprising: a vertical cache hierarchy including at least a store-through set-associative upper-level data cache and a store-in lower-level cache, wherein the set-associative upper-level data cache includes a set-associative data array and a set-associative effective address (EA) directory having a plurality of directory entries each corresponding to a respective data entry among a plurality of data entries in the data array; a processor core including: an execution unit configured to process memory access instructions and, based on processing the memory access instructions, initiate accesses to the vertical cache hierarchy; a translation unit configured to translate EAs to real addresses (RAs); an operand data prefetch unit that prefetches, into the vertical cache hierarchy, operand data likely to be accessed by the processor core through execution of demand memory access instructions by the execution unit, wherein the operand data prefetch unit is configured, based on a prefetch miss in the set-associative EA directory for a store target EA, to initiate allocation of a directory entry in the set-associative EA directory for the store target EA without prefetching an associated cache line of operand data identified by the store target EA into the corresponding data entry in the data array; and wherein the processor core caches in the directory entry EA-to-RA address translation information for the store target EA, such that a subsequent demand store access that hits in the directory entry can avoid a performance penalty associated with address translation by the translation unit. 8. The processor of claim 7 , wherein the operand data prefetch unit is configured to prefetch operand data associated with the store target effective EA into the lower-level cache. 9. The processor of claim 7 , wherein: the processor core includes a real address (RA) directory of the set-associative upper-level data cache; and the EA-to-RA address translation information includes a pointer to a directory entry in the RA directory buffering an RA corresponding to the store target EA. 10. The processor of claim 7 , wherein: the operand data prefetch unit includes a prefetch queue (PRQ) including a plurality of queue entries; the operand data prefetch unit allocates a queue entry among the plurality of queue entries to a store prefetch stream including the store target EA; and the queue entry indicates a direction and stride for the store prefetch stream. 11. The processor of claim 10 , wherein the queue entry further indicates that prefetching of operand data for the prefetch store stream into the upper-level cache is inhibited. 12. The processor of claim 7 , wherein: the store target EA is a first store target EA; and the processor core, based on a hit of a second store target EA of a demand store access in the directory entry in the EA directory, utilizes the cached EA-to-RA address translation information to obtain the RA without translation of the second store target EA by the translation unit. 13. A data processing system, comprising: multiple processing units, including the processing unit of claim 7 ; a shared memory; and a system interconnect communicatively coupling the shared memory and the multiple processing units. 14. A design structure tangibly embodied in a machine-readable storage device for designing, manufacturing, or testing an integrated circuit, the design structure comprising: a processing unit, including: a vertical cache hierarchy including at least a set-associative store-through upper-level data cache and a store-in lower-level cache, wherein the set-associative upper-level data cache includes a set-associative data array and a set-associative effective address (EA) directory having a plurality of directory entries each corresponding to a respective data entry among a plurality of data entries in the data array; a processor core including: an execution unit configured to process memory access instructions and, based on processing the memory access instructions, initiate accesses to the vertical cache hierarchy; a translation unit configured to translate EAs to real addresses (RAs); an operand data prefetch unit that prefetches, into the vertical cache hierarchy, operand data likely to be accessed by the processor core through execution of demand memory access instructions by the execution unit, wherein the operand data prefetch unit is configured, based on a prefetch miss in the set-associative EA directory for a store target EA, to initiate allocation of a directory entry in the set-associative EA directory for the store target EA without prefetching an associated cache line of operand data identified by the store target EA into the corresponding data entry in the data array; and wherein the processor core caches in t
with prefetch · CPC title
according to data content, e.g. floating-point registers, address registers · CPC title
Operand accessing · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Organisation of register space, e.g. banked or distributed register file · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.