Exposing and reproducing software race conditions
US-2020210318-A1 · Jul 2, 2020 · US
US12118355B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12118355-B2 |
| Application number | US-202117506122-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 20, 2021 |
| Priority date | Oct 20, 2021 |
| Publication date | Oct 15, 2024 |
| Grant date | Oct 15, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.
Opening claim text (preview).
What is claimed is: 1. A method for validating cache coherence in a data processing system, the method comprising: detecting a load instruction requesting a processing element to transfer data from a target cache line in a global memory location to a local memory location, the processing element being among a plurality of processing elements of the data processing system; applying, in response to detecting the load instruction requesting the processing element to transfer the data from the target cache line in the global memory location to the local memory location, a delay in the global memory location to delay the transfer of the data from the target cache line in the global memory location to the local memory location, wherein applying the delay to the transfer of the data increases a window of opportunity for another processing element to make changes to the target cache line in the global memory location to create an out-of-order hazard for the data processing system, and wherein the delay is based on a distance between the processing element and other processing elements in the data processing system; executing the load instruction and transferring the data from the target cache line in the global memory location to the local memory location with the applied delay; and validating, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system. 2. The method of claim 1 , wherein: the global memory location is a memory location in a level two (L2) cache connected to the plurality of processing elements in the data processing system; and the local memory location is a memory location in a level one (L1) cache of the processing element. 3. The method of claim 1 , wherein the load instruction is a first load instruction, and applying the delay causes the execution of the first load instruction to be completed later than an execution of a second load instruction, the second load instruction being younger than the first load instruction, wherein the first load instruction and the second load instruction each loads from the target cache line. 4. The method of claim 3 , wherein validating the cache coherence of the data processing system comprises: in response to completing the execution of the first load instruction, detecting whether the second load instruction is being re-executed; in response to the second load instruction being re-executed, determining the cache coherence of the data processing system is successful; and in response to the second load instruction not being re-executed, determining the cache coherence of the data processing system has failed. 5. The method of claim 3 , wherein: validating the cache coherence of the data processing system comprises: executing a logic that is programmed to re-execute the second load instruction after the first load instruction is executed with the applied delay; in response to the second load instruction being re-executed after the first load instruction is executed with the applied delay, determining that the logic is functioning correctly; and in response to the second load instruction not being re-executed after the first load instruction is executed with the applied delay, determining that the logic is not functioning correctly. 6. The method of claim 1 , wherein the delay is based on a random number. 7. The method of claim 1 , further comprising: detecting a data forwarding request to transfer data from a load store unit of the processing element to the local memory location; and executing the data forwarding request and transferring the data from the load store unit of the processing element to the local memory location without the delay. 8. The method of claim 1 , wherein the changes to the target cache line comprises writing to the target cache line. 9. The method of claim 1 , further comprising controlling one or more linear feedback shift registers to generate the delay. 10. A computing system comprising: a first processing element; a second processing element; an interconnect connected to the first processing element and the second processing element; the first processing element being configured to: detect a load instruction requesting a processing element to transfer data from a target cache line in a global memory location to a local memory location, the global memory location being accessible by the first processing element and the second processing element, and the local memory location being accessible by the first processing element; apply, in response to detecting the load instruction requesting the first processing element to transfer the data from the target cache line in the global memory location to the local memory location, a delay in the global memory location to delay the transfer of the data from the target cache line in the global memory location to the local memory location, wherein application of the delay to the transfer of the data increases a window of opportunity for the second processing element to make changes to the target cache line in the global memory location to create an out-of-order hazard and wherein the delay is based on a distance between the first processing element and other processing elements in the computing system; execute the load instruction and transfer the data from the target cache line in the global memory location to the local memory location with the applied delay; and validate, in response to the execution of the load instruction and transferring the data with the applied delay, a cache coherence of the computing system. 11. The computing system of claim 10 , further comprising a level two (L2) cache connected to the first processing element and the second processing element, wherein: the global memory location is a memory location in the L2 cache; and the local memory location is a memory location in a level one (L1) cache of the first processing element. 12. The computing system of claim 10 , wherein the load instruction is a first load instruction, and the application of the delay causes the execution of the first load instruction to be completed later than an execution of a second load instruction, the second load instruction being younger than the first load instruction, wherein the first load instruction and the second load instruction each loads from the target cache line. 13. The computing system of claim 12 , wherein the first processing element is configured to: in response to completing the execution of the first load instruction, detect whether the second load instruction is being re-executed; in response to the second load instruction being re-executed, determine the cache coherence of the computing system is successful; and in response to the second load instruction not being re-executed, determine the cache coherence of the computing system has failed. 14. The computing system of claim 10 , wherein the delay is based on a random number. 15. A processing element comprising: a processor pipeline comprising one or more load store units (LSUs) configured to execute load and store instructions, the one or more LSUs being configured to: detect a load instruction requesting the processing element to transfer data from target cache line in a global memory location to a local memory location, the processing element being among a plurality of processing elements of a data processing system; apply, in response to detecting the load instruction requesting the processing element to transfer the data from the target cache line in the global memory location to the local memory location, a de
from multiple instruction streams, e.g. multistreaming · CPC title
Maintaining memory consistency · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
with multilevel cache hierarchies · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.