Transfer of cachelines in a processing system based on transfer costs

US11928060B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11928060-B2
Application numberUS-202217666950-A
CountryUS
Kind codeB2
Filing dateFeb 8, 2022
Priority dateDec 2, 2019
Publication dateMar 12, 2024
Grant dateMar 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a requested cacheline, the second cache selects the candidate first cache having the shortest total path from the second cache to the candidate first cache and from the candidate first cache to the compute unit issuing a request for the requested cacheline.

First claim

Opening claim text (preview).

What is claimed is: 1. A processing system comprising: a plurality of compute units, each compute unit having a private cache; and a shared cache coupled to the plurality of compute units, wherein the shared cache is configured to: transfer a valid copy of a requested cacheline from a private cache of a compute unit of the plurality of compute units having a lowest transfer cost to a private cache of a requesting compute unit of the plurality of compute units, the lowest transfer cost is identified based on each distance between the requesting compute unit and the private caches having a valid copy of the requested cacheline and based on each corresponding distance between the shared cache and the private caches having a valid copy of the requested cacheline. 2. The processing system of claim 1 , wherein: the lowest transfer cost is based on a combination of a first distance metric result calculated based on each distance between the requesting compute unit and the private caches having a valid copy of the requested cacheline and a second distance metric result calculated based on each distance between the shared cache and the private caches having a valid copy of the requested cacheline. 3. The processing system of claim 2 , wherein: the first distance metric result and the second distance metric result each indicate a corresponding number of clock cycles associated with the processing system. 4. The processing system of claim 2 , wherein: the lowest transfer cost is based on topology information representing a topology of the compute units. 5. The processing system of claim 4 , wherein: the topology information further represents one or more policies regarding transfer of cachelines via one or more interconnects. 6. The processing system of claim 4 , wherein: the topology information is implemented as a look-up table accessible by the shared cache. 7. The processing system of claim 4 , wherein: the topology information is implemented as hardware logic accessible by the shared cache. 8. The processing system of claim 7 , wherein: the hardware logic is one of hard-coded logic or programmable logic. 9. The processing system of claim 4 , wherein: the topology information includes information representing at least one of: a representation of a physical topology of paths between the plurality of compute units; characteristics of one or more interconnects; and at least one policy for transferring cachelines. 10. The processing system of claim 1 , further comprising: a shadow tag memory accessible by the shared cache, wherein the shared cache is to identify a subset of private caches of the plurality of compute units having a valid copy of the requested cacheline using the shadow tag memory. 11. A method comprising: in response to a request for an identified cacheline from a requesting compute unit of a plurality of compute units, each of the plurality of compute units associated with a corresponding private cache: identifying, at a shared cache, a private cache having a lowest transfer cost for providing a valid copy of the identified cacheline to the requesting compute unit, the lowest transfer cost is identified based on each distance between the requesting compute unit and the private caches having a valid copy of the identified cacheline and based on each corresponding distance between the shared cache and the private caches having a valid copy of the identified cacheline; and transferring the valid copy of the identified cacheline from the identified private cache. 12. The method of claim 11 , wherein: the lowest transfer cost is further based on a summation of a first distance metric and a second distance metric. 13. The method of claim 12 , wherein: the first distance metric and the second distance metric each indicate a corresponding number of clock cycles associated with a processing system that includes the plurality of compute units. 14. The method of claim 12 , wherein: the lowest transfer cost is based on topology information representing a topology of the compute units. 15. The method of claim 14 , wherein: the topology information further represents one or more policies regarding transfer of cachelines via one or more interconnects. 16. The method of claim 14 , wherein: the topology information includes information representing at least one of: a representation of a physical topology of paths between the plurality of compute units; characteristics of one or more interconnects; and at least one policy for transferring cachelines. 17. The method of claim 11 , further comprising: identifying a subset of corresponding private caches of the plurality of compute units having a valid copy of the identified cacheline. 18. The method of claim 17 , wherein: identifying the subset of corresponding private caches of the plurality of compute units comprises: identifying the subset using a shadow tag memory accessible to the shared cache. 19. A processing system comprising: a plurality of compute units, each compute unit having an associated cache of a plurality of first caches; and a cache shared by the plurality of compute units, wherein a third cache is configured to transfer a valid copy of a requested cacheline from a first cache of the plurality of first caches having a lowest transfer cost to a second cache of the plurality of first caches, wherein the second cache is associated with a requesting compute unit, and wherein the lowest transfer cost is identified based on each distance between the requesting compute unit and the plurality of first caches having a valid copy of the requested cacheline and based on each corresponding distance between the third cache and the plurality of first caches having a valid copy of the requested cacheline. 20. The processing system of claim 19 , wherein: the lowest transfer cost is based on one or more policies associated with interconnect segments of a path between the first cache and the second cache of the plurality of first caches.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11928060B2 cover?
A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a re…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).