What technology area does this patent fall under?

Primary CPC classification G06F12/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Transfer of cachelines in a processing system based on transfer costs

US11275688B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11275688-B2
Application number	US-201916700671-A
Country	US
Kind code	B2
Filing date	Dec 2, 2019
Priority date	Dec 2, 2019
Publication date	Mar 15, 2022
Grant date	Mar 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a requested cacheline, the second cache selects the candidate first cache having the shortest total path from the second cache to the candidate first cache and from the candidate first cache to the compute unit issuing a request for the requested cacheline.

First claim

Opening claim text (preview).

What is claimed is: 1. A processing system comprising: a plurality of compute units, each compute unit including at least one processor core and at least one private cache of a plurality of private caches, each private cache configured to store a corresponding set of cachelines; a shared cache that is shared by the plurality of compute units and coupled to the plurality of compute units via one or more interconnects, wherein the shared cache is configured to: in response to receipt of a request for an identified cacheline from a requesting compute unit, identify a subset of the plurality of private caches that has a valid copy of the identified cacheline; identify the private cache of the subset having a lowest transfer cost for providing a valid copy of the identified cacheline to the requesting compute unit; and transmit a probe request to a target compute unit having the identified private cache via at least one interconnect of the one or more interconnects; and wherein, in response to receipt of the probe request, the target compute unit is configured to transfer a valid copy of the identified cacheline to the requesting compute unit via at least one interconnect of the one or more interconnects; and wherein the shared cache is configured to identify which private cache of the subset has the lowest transfer cost by: determining, for each private cache of the subset, a corresponding transfer cost metric based on a first distance metric and a second distance metric, the first distance metric representing a distance between the shared cache and the private cache via the one or more interconnects and the second distance metric representing a distance between the private cache and the requesting compute unit; and identifying the private cache having the lowest corresponding transfer cost metric as the private cache with the lowest transfer cost. 2. The processing system of claim 1 , wherein the lowest transfer cost is based on a sum of the first distance metric and the second distance metric. 3. The processing system of claim 2 , wherein the first distance metric and the second distance metric are expressed in terms of clock cycles. 4. The processing system of claim 2 , wherein: the shared cache is configured to determine the transfer cost metrics for the private caches of the subset based on topology information representing a topology of the compute units, the shared cache, and the one or more interconnects. 5. The processing system of claim 4 , wherein the topology information further represents one or more policies regarding transfer of cachelines via the one or more interconnects. 6. The processing system of claim 4 , wherein: the topology information is implemented as a look-up table accessible by the shared cache, the look-up table configured to receive as inputs an identifier of the requesting compute unit and an identifier of the compute unit having a corresponding private cache, and to provide as an output a corresponding transfer cost metric. 7. The processing system of claim 4 , wherein: the topology information is implemented as hardware logic accessible by the shared cache, the hardware logic configured to receive as inputs an identifier of the requesting compute unit and an identifier of the compute unit having a corresponding private cache, and to provide as an output a corresponding transfer cost metric. 8. The processing system of claim 7 , wherein the hardware logic is one of: hard-coded logic or programmable logic. 9. The processing system of claim 4 , wherein: the topology information includes information representing at least one of: a representation of a physical topology of paths between the plurality of compute units via the one or more interconnects; characteristics of the one or more interconnects; and at least one policy for transferring cachelines; and the shared cache is configured to determine the transfer cost metrics based on calculations performed using the information. 10. The processing system of claim 1 , further comprising: a shadow tag memory accessible by the shared cache, the shadow tag memory comprising a plurality of entries, each entry storing state and address information for a corresponding cacheline of one of the private caches; and wherein the shared cache is to identify the subset of the plurality of private caches that has a valid copy of the identified cacheline using the shadow tag memory. 11. The processing system of claim 1 , wherein: the probe request includes at least one of an identifier of the requesting compute unit and an identifier for the request. 12. The processing system of claim 1 , wherein: the shared cache is configured to store a separate set of cachelines; and responsive to determining the separate set of cachelines includes a valid copy of the identified cacheline, the shared cache is to transfer a copy of the identified cacheline to the requesting compute unit to satisfy the request for the identified cacheline in place of identifying a subset of the plurality of private caches, identifying a private cache, and transmitting a probe request. 13. A method for cacheline transfers in a system comprising a plurality of compute units and a shared cache, each compute unit including at least one private cache of a plurality of private caches, the method comprising: in response to a request for an identified cacheline from a requesting compute unit, identifying, at the shared cache, a subset of the compute units that have a valid copy of the identified cacheline; identifying, at the shared cache, the private cache of the subset having a lowest transfer cost for providing a valid copy of the identified cacheline to the requesting compute unit; transmitting a probe request from the shared cache to a target compute unit having the identified private cache via at least one interconnect of the one or more interconnects; and in response to receipt of the probe request, transmitting a valid copy of the identified cacheline from the target compute unit to the requesting compute unit via at least one interconnect of the one or more interconnects; and wherein identifying which private cache of the subset has the lowest transfer cost comprises: determining, for each private cache of the subset, a corresponding transfer cost metric based on a first distance metric and a second distance metric, the first distance metric representing a distance between the shared cache and the private cache via the one or more interconnects and the second distance metric representing a distance between the private cache and the requesting compute unit; and identifying the private cache having the lowest corresponding transfer cost metric as the private cache with the lowest transfer cost. 14. The method of claim 13 , wherein the lowest transfer cost is based on a sum of the first distance metric and the second distance metric. 15. The method of claim 14 , wherein the first distance metric and the second distance metric are expressed in terms of clock cycles. 16. The method of claim 14 , wherein: determining a corresponding transfer cost metric comprises determining the corresponding transfer cost metric based on topology information representing a topology of the compute units, the shared cache, and the one or more interconnects. 17. The method of claim 16 , wherein: the topology information is implemented as at least one of: a look-up table accessible by the shared cache, the look-up table configured to receive as inputs an identifier of the requesting compute unit and an identifier of the target compu

Assignees

Advanced Micro Devices Inc

Inventors

Classifications

G06F2212/502
using adaptive policy · CPC title
G06F2212/1048
Scalability · CPC title
G06F2212/1041
Resource optimization · CPC title
G06F2212/1021
Hit rate improvement · CPC title
G06F12/0888
using selective caching, e.g. bypass · CPC title

Patent family

Related publications grouped by family.

View patent family 76091475

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11275688B2 cover?: A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a re…
Who is the assignee on this patent?: Advanced Micro Devices Inc
What technology area does this patent fall under?: Primary CPC classification G06F12/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).