Caching data in a memory system having memory nodes at different hierarchical levels
US-2015378913-A1 · Dec 31, 2015 · US
US11768771B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11768771-B2 |
| Application number | US-202117547148-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 9, 2021 |
| Priority date | Sep 19, 2016 |
| Publication date | Sep 26, 2023 |
| Grant date | Sep 26, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: in response to a load in a spin-loop executing a load instruction for a contended semaphore, the contended semaphore being released, and a cache line storing the contended semaphore being loaded in an exclusive state, setting, for a first state machine, a first state machine value corresponding to an address storing the contended semaphore and preventing external requests for the cache line from being satisfied; and executing a lock-compare-and-exchange instruction. 2. The method of claim 1 , further comprising determining that an address for the semaphore is considered contended. 3. The method of claim 2 , wherein the address for the semaphore is considered contended due to two or more threads attempting to access the address in a given amount of time. 4. The method of claim 1 , further comprising detecting that the load in the spin-loop executes the load instruction for the contended semaphore by detecting that a non-lock load hits an address storing the contended semaphore. 5. The method of claim 1 , further comprising detecting that the contended semaphore is released by detecting that a cache line associated with an address storing the contended semaphore has been evicted. 6. The method of claim 1 , further comprising setting a second state machine value for the state machine for the address in response to the contended semaphore being released. 7. The method of claim 6 , further comprising setting a third state machine value for the state machine for the address in response to detecting that the cache line is loaded in an exclusive state. 8. The method of claim 7 , wherein preventing external requests for the cache line from being satisfied occurs in response to no out-of-sequence events occurring for the state machine. 9. A system comprising: a processing core including a load/store unit; and a cache, wherein the load/store unit is configured to handle cache coherency traffic for a contended semaphore by: in response to a load in a spin-loop executing a load instruction for a contended semaphore, the contended semaphore being released, and a cache line storing the contended semaphore being loaded in an exclusive state, setting, for a first state machine, a first state machine value corresponding to an address storing the contended semaphore and preventing external requests for the cache line from being satisfied; and executing a lock-compare-and-exchange instruction. 10. The system of claim 9 , wherein the load/store unit is further configured to determine that an address for the semaphore is considered contended. 11. The system of claim 10 , wherein the address for the semaphore is considered contended due to two or more threads attempting to access the address in a given amount of time. 12. The system of claim 9 , further comprising detecting that the load in the spin-loop executes the load instruction for the contended semaphore by detecting that a non-lock load hits an address storing the contended semaphore. 13. The system of claim 9 , further comprising detecting that the contended semaphore is released by detecting that a cache line associated with an address storing the contended semaphore has been evicted. 14. The system of claim 9 , wherein the load/store unit is further configured to set a second state machine value for the state machine for the address in response to the contended semaphore being released. 15. The system of claim 14 , wherein the load/store unit is further configured to set a third state machine value for the state machine for the address in response to detecting that the cache line is loaded in an exclusive state. 16. The system of claim 15 , wherein preventing external requests for the cache line from being satisfied occurs in response to no out-of-sequence events occurring for the state machine. 17. A processor, comprising: a plurality of processing cores coupled together, each processing core including a load/store unit; and a plurality of caches, each cache associated with a respective processing core of the plurality of processing cores, wherein the load/store unit of each processing core of the plurality of processing cores is configured to handle cache coherency traffic for a contended semaphore by: in response to a load in a spin-loop executing a load instruction for a contended semaphore, the contended semaphore being released, and a cache line storing the contended semaphore being loaded in an exclusive state, setting, for a first state machine, a first state machine value corresponding to an address storing the contended semaphore and preventing external requests for the cache line from being satisfied; and executing a lock-compare-and-exchange instruction. 18. The processor of claim 17 , wherein each load/store unit is further configured to determine that an address for the semaphore is considered contended.
Multiple simultaneous or quasi-simultaneous cache accessing · CPC title
Barrier synchronisation · CPC title
Cache consistency protocols · CPC title
Cache access modes · CPC title
Correctness of operation, e.g. memory ordering · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.