Direct store to coherence point
US-2018004660-A1 · Jan 4, 2018 · US
US10303603B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10303603-B2 |
| Application number | US-201715621870-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 13, 2017 |
| Priority date | Jun 13, 2017 |
| Publication date | May 28, 2019 |
| Grant date | May 28, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A special class of loads and stores access a user-defined memory region where coherency and memory orders are only enforced at the coherent point. Coherent memory requests, which are limited to user-defined memory region, are dispatched to the common memory ordering buffer. Non-coherent memory requests (e.g., all other memory requests) can be routed via non-coherent lower level caches to the shared last level cache. By assigning a private, non-overlapping, address spaces to each of the processor cores, the lower-level caches do not need to implement the logic necessary to maintain cache coherency. This can reduce power consumption and integrated circuit die area. This can also improve memory bandwidth and performance for applications with predominantly non-coherent memory accesses while still providing memory coherence for specific memory range(s)/applications that demand it.
Opening claim text (preview).
What is claimed is: 1. An integrated circuit, comprising: a plurality of processor cores that share a common last-level cache, the plurality of processor cores each including a non-coherent memory order buffer, a first processor core being a one of the plurality of processor cores; and, a shared memory order buffer directly coupled to each of the plurality of processor cores such that coherent store transactions sent by the plurality of processor cores are directly received at the shared memory order buffer without being processed by at least one lower-level cache; the common last-level cache to receive store transactions sent by the non-coherent memory order buffers of the plurality of processor cores, the common last-level cache to also receive store transactions, from the shared memory order buffer, that correspond to the coherent store transactions sent by the plurality of processor cores. 2. The integrated circuit of claim 1 , wherein the store transactions sent by the non-coherent memory order buffers of the plurality of plurality of processor cores include store transactions that have been processed by the at least one lower-level cache before being sent to the last-level cache. 3. The integrated circuit of claim 1 , wherein the coherent store transactions sent by the plurality of processor cores are to be sent directly to the shared memory order buffer based at least in part on addresses targeted by the coherent store transactions being within a configured address range. 4. The integrated circuit of claim 1 , wherein the store transactions sent by the non-coherent memory order buffers are to be processed by the at least one lower-level cache before being sent to the last-level cache based at least in part on addresses targeted by the store transactions sent by the non-coherent memory order buffers being within a configured address range. 5. The integrated circuit of claim 1 , wherein the coherent store transactions sent by the plurality of processor cores are to be sent directly to the shared memory order buffer based at least in part on addresses targeted by the coherent store transactions being within an address range specified by at least one register that is writable by the first processor core. 6. The integrated circuit of claim 3 , wherein the configured address range corresponds to at least one memory page. 7. The integrated circuit of claim 4 , wherein the configured address range corresponds to at least one memory page. 8. A method of operating a processing system, comprising: receiving, from a plurality of processor cores, a plurality of non-coherent store transactions at a common last-level cache, a first processor core being one of the plurality of processor cores; receiving, from the plurality of processor cores, a plurality of coherent store transactions directly at a shared memory order buffer directly coupled to each of the plurality of processor cores; issuing, by the first processor core and directly to the shared memory order buffer, at least a first coherent store transaction, the first coherent store transaction to be processed by the shared memory order buffer before being sent to the last-level cache and without being processed by at least one lower-level cache; issuing, by the first processor core, at least a first non-coherent store transaction, the first non-coherent store transaction to be processed by the at least one lower-level cache before being sent to the last-level cache; and, receiving, at the last-level cache, the non-coherent store transaction and data stored by the coherent store transaction. 9. The method of claim 8 , wherein the first processor core issues the first coherent store transaction based on an address corresponding to the target of a store instruction being executed by the first processor core falling within a configured address range. 10. The method of claim 9 , wherein the configured address range corresponds to at least one memory page. 11. The method of claim 10 , wherein a page table entry associated with the at least one memory page includes an indicator that the first processor core is to issue the first coherent store transaction. 12. The method of claim 9 , further comprising: receiving, from a register written by a one of the plurality of processors, an indicator that corresponds to at least one limit of the configured address range. 13. The method of claim 8 , wherein the first processor core issues the first non-coherent store transaction based on an address corresponding to the target of a store instruction being executed by the first processor core falling within a configured address range. 14. The method of claim 13 , wherein the configured address range corresponds to at least one memory page. 15. The method of claim 14 , wherein a page table entry associated with the at least one memory page includes an indicator that the first processor core is to issue the first non-coherent store transaction. 16. The method of claim 11 , further comprising: receiving, from a register written by a one of the plurality of processors, an indicator that corresponds to at least one limit of the configured address range. 17. A processing system, comprising: a plurality of processing cores each coupled to at least a respective first level cache; a last-level cache, separate from the first level caches, to receive a block of non-coherent store data from the first level caches; a shared memory order buffer, directly coupled to each of the plurality of processing cores and to the last-level cache, to receive a block of coherent store data from a first processing core of the plurality of processing cores without the block of coherent store data being processed by the first level caches. 18. The processing system of claim 17 , wherein an address range determines whether the block of coherent store data is to be sent to the shared memory order buffer without being processed by the first level caches. 19. The processing system of claim 17 , wherein an indicator in a page table entry determines whether the block of coherent store data is to be sent to the shared memory order buffer without being processed by the first level caches. 20. The processing system of claim 17 , wherein an indicator in a page table entry determines whether the block of non-coherent store data is to be sent to the last-level cache without being processed by the shared memory order buffer.
using a bus scheme, e.g. with bus monitoring or watching means · CPC title
with cache invalidating means (G06F12/0815 takes precedence) · CPC title
Address translation · CPC title
with a shared cache · CPC title
Cache consistency protocols · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.