Self-aware, peer-to-peer cache transfers between local, shared cache memories in a multi-processor system
US-2017371783-A1 · Dec 28, 2017 · US
US11928060B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11928060-B2 |
| Application number | US-202217666950-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 8, 2022 |
| Priority date | Dec 2, 2019 |
| Publication date | Mar 12, 2024 |
| Grant date | Mar 12, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a requested cacheline, the second cache selects the candidate first cache having the shortest total path from the second cache to the candidate first cache and from the candidate first cache to the compute unit issuing a request for the requested cacheline.
Opening claim text (preview).
What is claimed is: 1. A processing system comprising: a plurality of compute units, each compute unit having a private cache; and a shared cache coupled to the plurality of compute units, wherein the shared cache is configured to: transfer a valid copy of a requested cacheline from a private cache of a compute unit of the plurality of compute units having a lowest transfer cost to a private cache of a requesting compute unit of the plurality of compute units, the lowest transfer cost is identified based on each distance between the requesting compute unit and the private caches having a valid copy of the requested cacheline and based on each corresponding distance between the shared cache and the private caches having a valid copy of the requested cacheline. 2. The processing system of claim 1 , wherein: the lowest transfer cost is based on a combination of a first distance metric result calculated based on each distance between the requesting compute unit and the private caches having a valid copy of the requested cacheline and a second distance metric result calculated based on each distance between the shared cache and the private caches having a valid copy of the requested cacheline. 3. The processing system of claim 2 , wherein: the first distance metric result and the second distance metric result each indicate a corresponding number of clock cycles associated with the processing system. 4. The processing system of claim 2 , wherein: the lowest transfer cost is based on topology information representing a topology of the compute units. 5. The processing system of claim 4 , wherein: the topology information further represents one or more policies regarding transfer of cachelines via one or more interconnects. 6. The processing system of claim 4 , wherein: the topology information is implemented as a look-up table accessible by the shared cache. 7. The processing system of claim 4 , wherein: the topology information is implemented as hardware logic accessible by the shared cache. 8. The processing system of claim 7 , wherein: the hardware logic is one of hard-coded logic or programmable logic. 9. The processing system of claim 4 , wherein: the topology information includes information representing at least one of: a representation of a physical topology of paths between the plurality of compute units; characteristics of one or more interconnects; and at least one policy for transferring cachelines. 10. The processing system of claim 1 , further comprising: a shadow tag memory accessible by the shared cache, wherein the shared cache is to identify a subset of private caches of the plurality of compute units having a valid copy of the requested cacheline using the shadow tag memory. 11. A method comprising: in response to a request for an identified cacheline from a requesting compute unit of a plurality of compute units, each of the plurality of compute units associated with a corresponding private cache: identifying, at a shared cache, a private cache having a lowest transfer cost for providing a valid copy of the identified cacheline to the requesting compute unit, the lowest transfer cost is identified based on each distance between the requesting compute unit and the private caches having a valid copy of the identified cacheline and based on each corresponding distance between the shared cache and the private caches having a valid copy of the identified cacheline; and transferring the valid copy of the identified cacheline from the identified private cache. 12. The method of claim 11 , wherein: the lowest transfer cost is further based on a summation of a first distance metric and a second distance metric. 13. The method of claim 12 , wherein: the first distance metric and the second distance metric each indicate a corresponding number of clock cycles associated with a processing system that includes the plurality of compute units. 14. The method of claim 12 , wherein: the lowest transfer cost is based on topology information representing a topology of the compute units. 15. The method of claim 14 , wherein: the topology information further represents one or more policies regarding transfer of cachelines via one or more interconnects. 16. The method of claim 14 , wherein: the topology information includes information representing at least one of: a representation of a physical topology of paths between the plurality of compute units; characteristics of one or more interconnects; and at least one policy for transferring cachelines. 17. The method of claim 11 , further comprising: identifying a subset of corresponding private caches of the plurality of compute units having a valid copy of the identified cacheline. 18. The method of claim 17 , wherein: identifying the subset of corresponding private caches of the plurality of compute units comprises: identifying the subset using a shadow tag memory accessible to the shared cache. 19. A processing system comprising: a plurality of compute units, each compute unit having an associated cache of a plurality of first caches; and a cache shared by the plurality of compute units, wherein a third cache is configured to transfer a valid copy of a requested cacheline from a first cache of the plurality of first caches having a lowest transfer cost to a second cache of the plurality of first caches, wherein the second cache is associated with a requesting compute unit, and wherein the lowest transfer cost is identified based on each distance between the requesting compute unit and the plurality of first caches having a valid copy of the requested cacheline and based on each corresponding distance between the third cache and the plurality of first caches having a valid copy of the requested cacheline. 20. The processing system of claim 19 , wherein: the lowest transfer cost is based on one or more policies associated with interconnect segments of a path between the first cache and the second cache of the plurality of first caches.
with a shared cache · CPC title
Hit rate improvement · CPC title
Resource optimization · CPC title
with multilevel cache hierarchies · CPC title
using directory methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.