Light-weight cache coherence for data processors with limited data sharing

US10042762B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10042762-B2
Application numberUS-201615264804-A
CountryUS
Kind codeB2
Filing dateSep 14, 2016
Priority dateSep 14, 2016
Publication dateAug 7, 2018
Grant dateAug 7, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data processing system includes a plurality of processors, local memories associated with a corresponding processor, and at least one inter-processor link. In response to a first processor performing a load or store operation on an address of a corresponding local memory that is not currently in the local cache, a local cache allocates a first cache line and encodes a local state with the first cache line. In response to a load operation from an address of a remote memory that is not currently in the local cache, the local cache allocates a second cache line and encodes a remote state with the second cache line. The first processor performs subsequent loads and stores on the first cache line in the local cache in response to the local state, and subsequent loads from the second cache line in the local cache in response to the remote state.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing system, comprising: a plurality of processors each comprising a local cache; a plurality of local memories each associated with and coupled to a corresponding one of said plurality of processors; and at least one inter-processor link between a corresponding pair of said plurality of processors, wherein in response to a first processor performing a store operation on an address of a corresponding local memory that is currently not in said local cache, said local cache allocates a first cache line and encodes a local state with said first cache line, in response to said first processor performing a load operation from an address of a remote memory that is currently not in said local cache, said local cache allocates a second cache line and encodes a remote state with said second cache line, and in response to said first processor performing a store operation to an address of said remote memory that is currently not in said local cache, said local cache performs a write-through of data of said store operation to said remote memory without allocating a cache line in said local cache, and wherein said first processor performs subsequent loads and stores on said first cache line in said local cache in response to said local state, and performs subsequent loads from said second cache line in said local cache in response to said remote state. 2. The data processing system of claim 1 , wherein in response to said first processor performing a store access to said second cache line, said local cache updates its copy of said second cache line and also performs a write-through operation of said store access to said remote memory. 3. The data processing system of claim 1 , wherein in response to an acquire operation from said first processor, said local cache invalidates cache lines that are in said remote state. 4. The data processing system of claim 1 , wherein in response to said first processor performing a write operation to an address of said remote memory that is currently not in said local cache, said local cache allocates a third cache line to store data associated with said write operation and tracks a location of said third cache line within said local cache in a hardware data structure. 5. The data processing system of claim 4 , wherein said local cache flushes contents of cache lines in said local cache corresponding to tracked addresses in said hardware data structure in response to one of an acquire operation and a release operation. 6. The data processing system of claim 1 , wherein each of said plurality of processors comprises a processor-in-memory, and said local memory comprises a plurality of vertically stacked high bandwidth memory chips. 7. The data processing system of claim 1 , wherein said local cache maintains a cache state of each cache line using a plurality of bits, wherein said plurality of bits comprises a valid bit and a remote bit, wherein said local cache encodes said local state when said valid bit is true and said remote bit is false, and encodes said remote state when said valid bit is false and said remote bit is true. 8. The data processing system of claim 7 , wherein in response to an acquire operation from said first processor, said local cache clears all remote bits while leaving any other bits of said plurality of bits unchanged. 9. The data processing system of claim 1 , wherein: each of said plurality of processors is combined with one or more corresponding local memories in a respective memory module; and the data processing system further comprises a host processor coupled to each of said plurality of processors. 10. The data processing system of claim 1 , wherein in response to said load operation to said remote memory, said local cache indicates said second cache line is in said remote state only if said load operation is to a coherent portion of an address space of said remote memory. 11. The data processing system of claim 1 , wherein in response to said first processor performing a load operation on an address of said corresponding local memory that is currently not in said local cache, said local cache allocates a third cache line and encodes a local state with said third cache line. 12. A memory module, comprising: a local processor comprising a local cache and having an inter-processor link; and a local memory comprising a plurality of memory chips attached to said local processor, wherein in response to said local processor performing a store operation on an address of said local memory that is currently not in said local cache, said local cache allocates a first cache line and encodes a local state with said first cache line, in response to said local processor performing a load operation from an uncached address of a remote memory that is currently not in said local cache, said local cache allocates a second cache line and encodes a remote state with said second cache line, and in response to said local processor performing a store operation to an address of said remote memory that is currently not in said local cache, said local cache performs a write-through of data of said store operation to said remote memory without allocating a cache line in said local cache, and wherein said local processor performs subsequent loads and stores to said first cache line in said local cache in response to said local state, and performs subsequent loads from said second cache line in said local cache in response to said remote state. 13. The memory module of claim 12 , wherein in response to said local processor performing a store operation to said second cache line, said local cache updates its cache copy of said second cache line, and also performs a write-through operation of said store operation to said remote memory. 14. The memory module of claim 12 , wherein in response to an acquire operation from said local processor, said local cache invalidates cache lines that are in said remote state. 15. The memory module of claim 12 , wherein in response to said local processor performing a write operation an address of said remote memory that is currently not in said local cache to said second cache line, said local cache allocates a third cache line to store data associated with said write operation and tracks a location of said third cache line within said local cache in a hardware data structure. 16. The memory module of claim 15 , wherein said local cache flushes contents of cache lines in said local cache corresponding to tracked addresses in said hardware data structure in response to one of an acquire operation and a release operation. 17. The memory module of claim 12 , wherein said local processor comprises a processor-in-memory, and said local memory comprises a plurality of vertically stacked high bandwidth memory chips. 18. The memory module of claim 12 , wherein said local cache maintains a cache state of each cache line using a plurality of bits, wherein said plurality of bits comprises a valid bit and a remote bit, wherein said local cache encodes said local state when said valid bit is true and said remote bit is false, and encodes said remote state when said valid bit is false and said remote bit is true. 19. The memory module of claim 18 , wherein in response to an acquire operation from said local processor, said local cache clears all remote bits while leaving any other bits of said plurality of bits unchanged. 20. The memory module of claim 12 , wherein in response to said local processor performing a load operation on an address

Assignees

Inventors

Classifications

  • Plural cache memories · CPC title

  • Coherency control relating to peripheral accessing, e.g. from DMA or I/O device · CPC title

  • with multilevel cache hierarchies · CPC title

  • Details of cache specific to multiprocessor cache arrangements · CPC title

  • Cache consistency protocols · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10042762B2 cover?
A data processing system includes a plurality of processors, local memories associated with a corresponding processor, and at least one inter-processor link. In response to a first processor performing a load or store operation on an address of a corresponding local memory that is not currently in the local cache, a local cache allocates a first cache line and encodes a local state with the fir…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/0811. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 07 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).