Techniques for handling cache coherency traffic for contended semaphores

US11216378B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11216378-B2
Application numberUS-201615268798-A
CountryUS
Kind codeB2
Filing dateSep 19, 2016
Priority dateSep 19, 2016
Publication dateJan 4, 2022
Grant dateJan 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for handling cache coherency traffic for a contended semaphore, the method comprising: a first detection, comprising detecting a non-lock load to an address associated with the contended semaphore, wherein the address associated with the contended semaphore is stored in a lock address contention table; responsive to the first detection, associating a first state value with the address in the lock address contention table; a second detection, comprising detecting that a cache line associated with the contended semaphore is evicted; responsive to the second detection, associating a second state value with the address in the lock address contention table; a third detection, comprising detecting a fill of the cache line in an exclusive state; responsive to the third detection, associating a third state value with the address in the lock address contention table; and responsive to the first detection, the second detection, and the third detection, entering a semaphore cache line protection mode in which requests for access to the cache line associated with the contended semaphore are responded to with negative acknowledgments that prevent access to the cache line associated with the contended semaphore. 2. The method of claim 1 , further comprising: entering the semaphore cache line protection mode occurs responsive to the first detection, the second detection, and the third detection occurring in order and without an out-of-sequence event occurring therebetween. 3. The method of claim 2 , wherein the out-of-sequence event comprises one or more of: any non-lock load to the address associated with the contended semaphore after the first detection but before the second detection or after the second detection but before the third detection, or any fill of the cache line in an exclusive state after the first detection but before the second detection. 4. The method of claim 1 , wherein entering the semaphore cache line protection mode comprises entering the semaphore cache line protection mode for a first number of cycles, and, after the first number of cycles, leaving the semaphore cache line protection mode. 5. The method of claim 1 , wherein: the non-lock load is included within a spin-loop of a semaphore acquisition sequence. 6. The method of claim 1 , wherein: the cache line associated with the contended semaphore is evicted responsive to a core that owns the contended semaphore requesting to write a value to the contended semaphore indicating that the contended semaphore is available. 7. The method of claim 1 , wherein the fill of the cache line in the exclusive state occurs responsive to a lock instruction for acquiring the semaphore. 8. A processing unit comprising: a processing core including a load/store unit; and a cache, wherein the load/store unit is configured to handle cache coherency traffic for a contended semaphore by: performing a first detection, comprising detecting a non-lock load to an address associated with the contended semaphore, wherein the address associated with the contended semaphore is stored in a lock address contention table; responsive to the first detection, associating a first state value with the address in the lock address contention table; performing a second detection, comprising detecting that a cache line associated with the contended semaphore is evicted; responsive to the second detection, associating a second state value with the address in the lock address contention table; performing a third detection, comprising detecting a fill of the cache line in an exclusive state responsive to the third detection, associating a third state value with the address in the lock address contention table; and responsive to the first detection, the second detection, and the third detection, entering a semaphore cache line protection mode in which requests for access to the cache line associated with the contended semaphore are responded to with negative acknowledgments that prevent access to the cache line associated with the contended semaphore. 9. The processing unit of claim 8 , wherein the load/store unit is configured to: enter the semaphore cache line protection mode responsive to the first detection, the second detection, and the third detection occurring in order and without an out-of-sequence event occurring therebetween. 10. The processing unit of claim 8 , wherein the out-of-sequence event comprises one or more of: any non-lock load to the address associated with the contended semaphore after the first detection but before the second detection or after the second detection but before the third detection, or any fill of the cache line in an exclusive state after the first detection but before the second detection. 11. The processing unit of claim 8 , wherein the load/store unit is configured to: leave the semaphore cache line protection mode after a first number of cycles has elapsed subsequent to entering the semaphore cache line protection mode. 12. The processing unit of claim 8 , wherein: the non-lock load is included within a spin-loop of a semaphore acquisition sequence. 13. The processing unit of claim 8 , wherein: the cache line associated with the contended semaphore is evicted responsive to a core that owns the contended semaphore requesting to write a value to the contended semaphore indicating that the contended semaphore is available. 14. The processing unit of claim 8 , wherein the fill of the cache line in the exclusive state occurs responsive to a lock instruction for acquiring the semaphore. 15. A processor, comprising: a plurality of processing cores coupled together, each processing core including a load/store unit; and a plurality of caches, each cache associated with a respective processing core of the plurality of processing cores, wherein the load/store unit of each processing core of the plurality of processing cores is configured to handle cache coherency traffic for a contended semaphore by: performing a first detection, comprising detecting a non-lock load to an address associated with the contended semaphore, wherein the address associated with the contended semaphore is stored in a lock address contention table; responsive to the first detection, associating a first state value with the address in the lock address contention table; performing a second detection, comprising detecting that a cache line associated with the contended semaphore is evicted; responsive to the second detection, associating a second state value with the address in the lock address contention table; performing a third detection, comprising detecting a fill of the cache line in an exclusive state; responsive to the third detection, associating a third state value with the address in the lock address contention table; and responsive to the first detection, the second detection, and the third detection, entering a semaphore cache line protection mode in which requests for access to the cache line associated with the contended semaphore are responded to with negative acknowledgments that prevent access to the cache line associated with the contended semaphore.

Assignees

Inventors

Classifications

  • Barrier synchronisation · CPC title

  • Multiple simultaneous or quasi-simultaneous cache accessing · CPC title

  • Correctness of operation, e.g. memory ordering · CPC title

  • Cache access modes · CPC title

  • Performance improvement · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11216378B2 cover?
The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/0844. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).