Coalescing memory barrier operations across multiple parallel threads

US9223578B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9223578-B2
Application numberUS-88708110-A
CountryUS
Kind codeB2
Filing dateSep 21, 2010
Priority dateSep 25, 2009
Publication dateDec 29, 2015
Grant dateDec 29, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for coalescing memory barrier instructions across multiple parallel execution threads, comprising: receiving a global memory barrier instruction for a first one of the multiple parallel execution threads; blocking the execution of memory transactions for the first thread that are after the global memory barrier instruction in program order; receiving a system memory barrier instruction for a second one of the multiple parallel execution threads, wherein the second thread is executed independently from the first thread; blocking the execution of memory transactions for the second thread that are after the system memory barrier instruction in program order; combining the global memory barrier instruction with the system memory barrier instruction to produce a coalesced memory barrier instruction; determining that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory; and releasing the coalesced memory barrier to allow execution of the memory transactions for the first thread that are after the global memory barrier instruction in program order and for the second thread that are after the system memory barrier instruction in program order. 2. The method of claim 1 , further comprising promoting the coalesced memory barrier instruction from a coalesced global memory barrier instruction to a coalesced system memory barrier instruction. 3. The method of claim 1 , wherein the step of determining comprises: issuing the coalesced memory barrier instruction to a memory management unit; and waiting for a memory barrier acknowledgement signal from the memory management unit to determine that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory. 4. The method of claim 3 , wherein the memory management unit is further configured to translate memory transactions for accessing the memory and the memory includes a global memory portion and a system memory portion. 5. The method of claim 4 , further comprising, prior to issuing the coalesced memory barrier instruction, determining that any pending memory transactions that access the global memory portion and the system memory portion have been acknowledged. 6. The method of claim 1 , further comprising the steps of: determining that the global memory barrier instruction is a triggering memory barrier instruction; and opening a coalescing window to generate the coalesced memory barrier instruction. 7. The method of claim 6 , further comprising, between the combining and the releasing, the steps of: closing the coalescing window; and deferring any subsequent memory barrier instructions that are received until after the coalesced memory barrier is released. 8. The method of claim 1 , further comprising executing a third one of the multiple parallel execution threads that has not reached a memory barrier instruction before the coalesced memory barrier is released. 9. The method of claim 1 , wherein the global memory barrier instruction and the system memory barrier instruction are external memory barrier instruction requests that are configured to enforce ordering between dependent groups of threads that include the first thread and the second thread. 10. A computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to coalesce memory barrier instructions across multiple parallel execution threads, by performing the steps of: receiving a global memory barrier instruction for a first one of the multiple parallel execution threads; blocking the execution of memory transactions for the first thread that are after the global memory barrier instruction in program order; receiving a system memory barrier instruction for a second one of the multiple parallel execution threads, wherein the second thread is executed independently from the first thread; blocking the execution of memory transactions for the second thread that are after the system memory barrier instruction in program order; combining the global memory barrier instruction with the system memory barrier instruction to produce a coalesced memory barrier instruction; determining that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory; and releasing the coalesced memory barrier to allow execution of the memory transactions for the first thread that are after the global memory barrier instruction in program order and for the second thread that are after the system memory barrier instruction in program order. 11. A system for coalescing memory barrier instructions across multiple parallel execution threads, the system comprising: a parallel thread processor including a memory barrier instruction execution unit that is configured to: receive a global memory barrier instruction for a first one of the multiple parallel execution threads; block the execution of memory transactions for the first thread that are after the global memory barrier instruction in program order; receive a system memory barrier instruction for a second one of the multiple parallel execution threads, wherein the second thread is executed independently from the first thread; block the execution of memory transactions for the second thread that are after the system memory barrier instruction in program order; combine the global memory barrier instruction with the system memory barrier instruction to produce a coalesced memory barrier instruction; determine that all memory transactions for the first thread and the second thread that occur prior to the coalesced memory barrier instruction are committed to memory; and release the coalesced memory barrier to allow execution of the memory transactions for the first thread that are after the global memory barrier instruction in program order and for the second thread that are after the system memory barrier instruction in program order. 12. The system of claim 11 , wherein the memory barrier instruction execution unit is further configured to promote the coalesced memory barrier instruction from a coalesced global memory barrier instruction to a coalesced system memory barrier instruction. 13. The system of claim 11 , wherein the system comprises a memory management unit that is coupled to the memory barrier instruction unit, and the memory barrier instruction execution unit is further configured to issue the coalesced memory barrier instruction to the memory management unit and wait for a memory barrier acknowledgement signal from the memory management unit. 14. The system of claim 13 , wherein the memory management unit is further configured to translate memory transactions for accessing the memory and the memory includes a global memory portion and a system memory portion. 15. The system of claim 14 , further comprising, prior to issuing the coalesced memory barrier instruction, determining that any pending memory transactions that access the global memory portion and the system memory portion have been acknowledged. 16. The system of claim 11 , wherein the memory barrier instruction execution unit is further configured to: determine that the global memory barrier instruction is a triggering memory barrier instruction; and open a coalescing window to generate the coalesced memory barrier instruction. 17. The system of claim 16 , wherein the memory barrier instruction execution

Assignees

Inventors

Classifications

  • to perform operations on memory · CPC title

  • Synchronisation or serialisation instructions · CPC title

  • G06F9/3834Primary

    Maintaining memory consistency · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9223578B2 cover?
One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are …
Who is the assignee on this patent?
Nickolls John R, Heinrich Steven James, Coon Brett W, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3834. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 29 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).