Relaxed coherency between different caches

US8930636B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8930636-B2
Application numberUS-201213555048-A
CountryUS
Kind codeB2
Filing dateJul 20, 2012
Priority dateJul 20, 2012
Publication dateJan 6, 2015
Grant dateJan 6, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment sets forth a technique for ensuring relaxed coherency between different caches. Two different execution units may be configured to access different caches that may store one or more cache lines corresponding to the same memory address. During time periods between memory barrier instructions relaxed coherency is maintained between the different caches. More specifically, writes to a cache line in a first cache that corresponds to a particular memory address are not necessarily propagated to a cache line in a second cache before the second cache receives a read or write request that also corresponds to the particular memory address. Therefore, the first cache and the second are not necessarily coherent during time periods of relaxed coherency. Execution of a memory barrier instruction ensures that the different caches will be coherent before a new period of relaxed coherency begins.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for maintaining relaxed coherency between a first cache and a second cache, the method comprising: receiving a write request corresponding to a first cache line of the first cache during a first time period when relaxed coherency is maintained between the first cache and the second cache; transmitting to the second cache an invalidate command that is based on the write request to generate a pending invalidate command for execution by the second cache, wherein the pending invalidate command is executed by the second cache at any time during the first time period; receiving a memory barrier instruction configured to ensure that data written by the write request can be read by a read request that is received after the barrier instruction; and determining that the pending invalidate command is executed by the second cache to complete execution of the memory barrier instruction and end the first time period before the second cache accepts either a new read request or a new write request. 2. The method of claim 1 , further comprising receiving a sibling cache mask identifying at least the second cache and a third cache as sibling caches of the first cache. 3. The method of claim 2 , further comprising transmitting the invalidate command to the third cache to generate a second pending invalidate command for execution by the third cache that is executed by the third cache at any time during the first time period. 4. The method of claim 3 , further comprising determining that the second pending invalidate command is executed by the third cache before the third cache accepts either a new read request or a new write request to complete execution of the memory barrier instruction and end the first time period. 5. The method of claim 1 , wherein a first memory barrier command corresponding to the memory barrier instruction is generated and output to the second cache following the invalidate command. 6. The method of claim 5 , further comprising tracking, by the first cache, a difference between a number of memory barrier instructions received by the first cache including the memory barrier instruction and a number of memory barrier commands that have been executed by the second cache including the first memory barrier command. 7. The method of claim 1 , further comprising: determining that the pending invalidate command is configured to invalidate a first cache line of the second cache; determining that a previously pending invalidate command is also configured to invalidate the first cache line of the second cache; and combining the pending invalidate command with the previously pending invalidate command. 8. The method of claim 1 , further comprising determining that the write request specifies a location in a global memory space, and wherein the new read request or the new write request that specifies a location in the global memory space is not accepted before the pending invalidate command is executed by the second cache. 9. The method of claim 1 , further comprising: determining that the write request specifies a location in a global memory space; receiving, after the memory barrier instruction, an additional read request or an additional write request that specifies a location that is not within the global memory space; and accepting the additional read request or the additional write request before the pending invalidate command is executed by the second cache. 10. The method of claim 1 , further comprising invalidating a cache line in the second cache to execute the pending invalidate command and complete execution of the memory barrier instruction. 11. A processing subsystem comprising: a first cache that is configured to: receive a write request corresponding to a first cache line of the first cache during a first time period when relaxed coherency is maintained between the first cache and a second cache; transmit, to the second cache, an invalidate command that is based on the write request to generate a pending invalidate command for execution by the second cache, wherein the pending invalidate command is executed by the second cache at any time during the first time period; receive a memory barrier instruction configured to ensure that data written by the write request can be read by a read request that is received after the barrier instruction; and determine that the pending invalidate command is executed by the second cache to complete execution of the memory barrier instruction and end the first time period before the second cache accepts either a new read request or a new write request; and the second cache that is configured to execute the pending invalidate command at any time during the first time period. 12. The processing subsystem of claim 11 , further comprising a third cache, wherein the first cache is further configured to receive a sibling cache mask identifying at least the second cache and the third cache as sibling caches of the first cache. 13. The processing subsystem of claim 12 , wherein the first cache is further configured to transmit the invalidate command to the third cache to generate a second pending invalidate command for execution by the third cache that is executed by the third cache at any time during the first time period. 14. The processing subsystem of claim 13 , wherein the first cache is further configured to determine that the second pending invalidate command is executed by the third cache before the third cache accepts either a new read request or a new write request to complete execution of the memory barrier instruction and end the first time period. 15. The processing subsystem of claim 11 , wherein the first cache is further configured to: determine that the pending invalidate command is configured to invalidate a first cache line of the second cache; determine that a previously pending invalidate command is also configured to invalidate the first cache line of the second cache; and combining the pending invalidate command with the previously pending invalidate command. 16. The processing subsystem of claim 11 , wherein the first cache is further configured to determine that the write request specifies a location in a global memory space, and wherein the new read request or the new write request that specifies a location in the global memory space is not accepted before the pending invalidate command is executed by the second cache. 17. The processing subsystem of claim 11 , wherein the first cache is further configured to: determine that the write request specifies a location in a global memory space; receive, after the memory barrier instruction, an additional read request or an additional write request that specifies a location that is not within the global memory space; and accept the additional read request or the additional write request before the pending invalidate command is executed by the second cache. 18. The processing subsystem of claim 11 , wherein the first cache is further configured to invalidate a cache line in the second cache to execute the pending invalidate command and complete execution of the memory barrier instruction. 19. The processing subsystem of claim 11 , further comprising an invalidation unit that is coupled to the first cache and the second cache and is configured to track a difference between pending memory barrier instructions that have been received by the first cache and not completed execution by the second cache. 20. A computing system, comprising: a parallel

Assignees

Inventors

Classifications

  • Cache consistency protocols · CPC title

  • with software control, e.g. non-cacheable data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8930636B2 cover?
One embodiment sets forth a technique for ensuring relaxed coherency between different caches. Two different execution units may be configured to access different caches that may store one or more cache lines corresponding to the same memory address. During time periods between memory barrier instructions relaxed coherency is maintained between the different caches. More specifically, writes to…
Who is the assignee on this patent?
Mccormack Joel James, Kota Rajesh, Giroux Olivier, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F12/0837. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 06 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).