Data value prediction
US-2024370268-A1 · Nov 7, 2024 · US
US8997103B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-8997103-B2 |
| Application number | US-201213441785-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 6, 2012 |
| Priority date | Sep 25, 2009 |
| Publication date | Mar 31, 2015 |
| Grant date | Mar 31, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for processing memory barrier instructions, the method comprising: receiving a first memory barrier instruction for a first thread group that includes multiple parallel execution threads; blocking the execution of memory transactions for the first thread group that are subsequent to the first memory barrier instruction in program order; receiving, subsequent to the first memory barrier instruction, a first set of memory transactions and a second memory barrier instruction for at least a second thread group that includes multiple execution threads; coalescing the first memory barrier instruction and the second memory barrier instruction to generate a coalesced memory barrier instruction; tagging each transaction in the first set of memory transactions with a first coalescing index associated with the coalesced memory barrier instruction to generate tagged memory commands; combining the tagged memory commands and the coalesced memory barrier instruction to generate a tagged memory command stream; transmitting the tagged memory command stream and memory transactions for the first thread group that are prior to the first memory barrier instruction in program order to a memory management unit to process the memory transactions for the first thread group that are prior to the first memory barrier instruction in program order, the first set of memory transactions, the first memory barrier instruction, and the second memory barrier instruction; determining that the memory transactions for the first thread group that are prior to the first memory barrier instruction in program order and the first set of memory transactions are committed to memory; and releasing both the first memory barrier instruction to allow the memory transactions for the first thread group that are subsequent to the first memory barrier instruction in program order to be executed and the second memory barrier instruction to allow the memory transactions for the second thread group that are subsequent to the second memory barrier instruction in program order to be executed. 2. The method of claim 1 , further comprising, in response to receiving the first memory barrier instruction, outputting a memory barrier accept signal that includes the first coalescing index. 3. The method of claim 2 , further comprising: receiving a third memory barrier instruction for a third thread group that includes multiple parallel execution threads; blocking the execution of memory transactions for the third thread group that are subsequent to the third memory barrier instruction in program order; receiving, subsequent to the third memory barrier instruction, a second set of memory transactions and a fourth memory barrier instruction for at least a fourth thread group that includes multiple execution threads; coalescing the third memory barrier instruction and the fourth memory barrier instruction to generate a second coalesced memory barrier instruction; tagging each transaction in the second set of memory transactions with a second coalescing index associated with the second coalesced memory barrier instruction to generate second tagged memory commands; combining the second tagged memory commands and the second coalesced memory barrier instruction to generate a second taqqed memory command stream; transmitting the second tagged memory command stream and memory transactions for the third thread group that are prior to the third memory barrier instruction in program order to the memory management unit to process the memory transactions for the third thread group that are prior to the third memory barrier instruction in program order, the second set of memory transactions, the third memory barrier instruction, and the fourth memory barrier instruction; determining that the memory transactions for the third thread group that are prior to the third memory barrier instruction in program order and the second set of memory transactions are committed to memory; and releasing both the third memory barrier instruction to allow the memory transactions for the third thread group that are subsequent to the third memory barrier instruction in program order to be executed and the fourth memory barrier instruction to allow the memory transactions for the fourth thread group that are subsequent to the fourth memory barrier instruction in program order to be executed. 4. The method of claim 3 , further comprising, in response to receiving the third memory barrier instruction, outputting a second memory barrier accept signal that includes the second coalescing index. 5. The method of claim 3 , wherein the second tagged memory command stream and memory transactions for the third thread group that are prior to the third memory barrier instruction in program order are transmitted to the memory management unit prior to determining that the memory transactions for the first thread group that are prior to the first memory barrier instruction in program order and the first set of memory transactions are committed to memory. 6. The method of claim 1 , wherein the step of determining comprises waiting for a memory barrier acknowledgement signal from the memory management unit that indicates that the memory transactions for the first thread group that are prior to the first memory barrier instruction in program order and the first set of memory transactions are committed to memory. 7. The method of claim 1 , further receiving an acknowledgement from the memory management unit once the coalesced memory barrier instruction has been processed prior to releasing the first memory barrier instruction and the second memory barrier instruction. 8. A computing system, comprising: a memory; and a parallel processing subsystem coupled to the memory and comprising: an instruction scheduling unit configured to: issue for execution a first memory barrier instruction for a first thread group that includes multiple parallel execution threads; issue for execution, subsequent to the first memory barrier instruction, a first set of memory transactions and a second memory barrier instruction for at least a second thread group that includes multiple execution threads; block the execution of memory transactions for the first thread group that are subsequent to the first memory barrier instruction in program order; block the execution of memory transactions for the second thread group that are subsequent to the second memory barrier instruction in program order; and release both the first memory barrier instruction to allow execution of the memory transactions for the first thread group that are subsequent to the first memory barrier instruction in program order and the second memory barrier instruction to allow execution of the memory transactions for the second thread group that are subsequent to the second memory barrier instruction in program order when an acknowledgement signal is received; a memory management unit configured to process memory transactions and memory barrier instructions; and a memory barrier instruction execution unit that is configured to: receive the first memory barrier instruction; receive the first set of memory transactions and the second memory barrier instruction; coalesce the first memory barrier instruction and the second memory barrier instruction to generate a coalesced memory barrier instruction; tag each transaction in the first set of memory transactions with a first coalescing index associated with the coalesced memory barrier instruction to generate tagged memory commands; and combine the tagged memory commands and the coalesced memory barrier instruction to generate a tagged memory command stream. 9. The computing syst
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Maintaining memory consistency · CPC title
to perform operations on memory · CPC title
Barrier synchronisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.