Data value prediction
US-2024370268-A1 · Nov 7, 2024 · US
US9223578B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9223578-B2 |
| Application number | US-88708110-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 21, 2010 |
| Priority date | Sep 25, 2009 |
| Publication date | Dec 29, 2015 |
| Grant date | Dec 29, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
Opening claim text (preview).
What is claimed is: 1. A method for coalescing memory barrier instructions across multiple parallel execution threads, comprising: receiving a global memory barrier instruction for a first one of the multiple parallel execution threads; blocking the execution of memory transactions for the first thread that are after the global memory barrier instruction in program order; receiving a system memory barrier instruction for a second one of the multiple parallel execution threads, wherein the second thread is executed independently from the first thread; blocking the execution of memory transactions for the second thread that are after the system memory barrier instruction in program order; combining the global memory barrier instruction with the system memory barrier instruction to produce a coalesced memory barrier instruction; determining that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory; and releasing the coalesced memory barrier to allow execution of the memory transactions for the first thread that are after the global memory barrier instruction in program order and for the second thread that are after the system memory barrier instruction in program order. 2. The method of claim 1 , further comprising promoting the coalesced memory barrier instruction from a coalesced global memory barrier instruction to a coalesced system memory barrier instruction. 3. The method of claim 1 , wherein the step of determining comprises: issuing the coalesced memory barrier instruction to a memory management unit; and waiting for a memory barrier acknowledgement signal from the memory management unit to determine that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory. 4. The method of claim 3 , wherein the memory management unit is further configured to translate memory transactions for accessing the memory and the memory includes a global memory portion and a system memory portion. 5. The method of claim 4 , further comprising, prior to issuing the coalesced memory barrier instruction, determining that any pending memory transactions that access the global memory portion and the system memory portion have been acknowledged. 6. The method of claim 1 , further comprising the steps of: determining that the global memory barrier instruction is a triggering memory barrier instruction; and opening a coalescing window to generate the coalesced memory barrier instruction. 7. The method of claim 6 , further comprising, between the combining and the releasing, the steps of: closing the coalescing window; and deferring any subsequent memory barrier instructions that are received until after the coalesced memory barrier is released. 8. The method of claim 1 , further comprising executing a third one of the multiple parallel execution threads that has not reached a memory barrier instruction before the coalesced memory barrier is released. 9. The method of claim 1 , wherein the global memory barrier instruction and the system memory barrier instruction are external memory barrier instruction requests that are configured to enforce ordering between dependent groups of threads that include the first thread and the second thread. 10. A computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to coalesce memory barrier instructions across multiple parallel execution threads, by performing the steps of: receiving a global memory barrier instruction for a first one of the multiple parallel execution threads; blocking the execution of memory transactions for the first thread that are after the global memory barrier instruction in program order; receiving a system memory barrier instruction for a second one of the multiple parallel execution threads, wherein the second thread is executed independently from the first thread; blocking the execution of memory transactions for the second thread that are after the system memory barrier instruction in program order; combining the global memory barrier instruction with the system memory barrier instruction to produce a coalesced memory barrier instruction; determining that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory; and releasing the coalesced memory barrier to allow execution of the memory transactions for the first thread that are after the global memory barrier instruction in program order and for the second thread that are after the system memory barrier instruction in program order. 11. A system for coalescing memory barrier instructions across multiple parallel execution threads, the system comprising: a parallel thread processor including a memory barrier instruction execution unit that is configured to: receive a global memory barrier instruction for a first one of the multiple parallel execution threads; block the execution of memory transactions for the first thread that are after the global memory barrier instruction in program order; receive a system memory barrier instruction for a second one of the multiple parallel execution threads, wherein the second thread is executed independently from the first thread; block the execution of memory transactions for the second thread that are after the system memory barrier instruction in program order; combine the global memory barrier instruction with the system memory barrier instruction to produce a coalesced memory barrier instruction; determine that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory; and release the coalesced memory barrier to allow execution of the memory transactions for the first thread that are after the global memory barrier instruction in program order and for the second thread that are after the system memory barrier instruction in program order. 12. The system of claim 11 , wherein the memory barrier instruction execution unit is further configured to promote the coalesced memory barrier instruction from a coalesced global memory barrier instruction to a coalesced system memory barrier instruction. 13. The system of claim 11 , wherein the system comprises a memory management unit that is coupled to the memory barrier instruction unit, and the memory barrier instruction execution unit is further configured to issue the coalesced memory barrier instruction to the memory management unit and wait for a memory barrier acknowledgement signal from the memory management unit. 14. The system of claim 13 , wherein the memory management unit is further configured to translate memory transactions for accessing the memory and the memory includes a global memory portion and a system memory portion. 15. The system of claim 14 , further comprising, prior to issuing the coalesced memory barrier instruction, determining that any pending memory transactions that access the global memory portion and the system memory portion have been acknowledged. 16. The system of claim 11 , wherein the memory barrier instruction execution unit is further configured to: determine that the global memory barrier instruction is a triggering memory barrier instruction; and open a coalescing window to generate the coalesced memory barrier instruction. 17. The system of claim 16 , wherein the memory barrier instruction execution
to perform operations on memory · CPC title
Synchronisation or serialisation instructions · CPC title
Maintaining memory consistency · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.