System and method for synchronizing threads in a divergent region of code

US10013290B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10013290-B2
Application numberUS-201213608912-A
CountryUS
Kind codeB2
Filing dateSep 10, 2012
Priority dateSep 10, 2012
Publication dateJul 3, 2018
Grant dateJul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method are provided for synchronizing threads in a divergent region of code within a multi-threaded parallel processing system. The method includes, prior to any thread entering a divergent region, generating a count that represents a number of threads that will enter the divergent region. The method also includes using the count within the divergent region to synchronize the threads in the divergent region.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for synchronizing a plurality of threads of a structured control flow program, the method comprising: synchronizing threads arriving at a conditional statement that precedes a divergent region, wherein the conditional statement is a data-dependent conditional test and the divergent region is a section of code that is executed based thereon; generating a count representing a number of the threads that will enter the divergent region, wherein the generating the count is performed once the threads are synchronized and before any of the threads enter the divergent region; and using the count generated before any of the threads enter the divergent region within the divergent region to synchronize the threads in the divergent region. 2. The method as recited in claim 1 , wherein the using the count comprises supplying the count to a barrier within the divergent region and the barrier using the count to synchronize the threads in the divergent region. 3. The method of claim 1 , wherein the plurality of threads is arranged in a plurality of groups of threads, and wherein the generating the count further comprises: for each group of threads, determining whether any thread in the group will enter the divergent region; and generating the count based on the result of the determining. 4. The method of claim 3 , wherein: the determining whether any thread in the group will enter the divergent region further comprises identifying one thread in the group that will enter the divergent region; and the generating the count based on the result of the determining further comprises generating the count based on the identified threads. 5. The method of claim 1 , wherein the using the count to synchronize the threads further comprises using a barrier implemented as one of a counting semaphore, a bit field, and a network of logic gates, wherein the barrier is implemented in one of hardware and software. 6. The method of claim 5 , wherein the using the barrier further comprises initializing the barrier using a value of the count. 7. The method of claim 1 , wherein the count is a first count, the divergent region is a first divergent region, and the first divergent region comprises a second divergent region, the method further comprising: generating a second count representing a number of threads that will enter the second divergent region, the second count generated prior to any thread entering the second divergent region, wherein the second count is generated using the first count; and using the second count within the second divergent region to synchronize the threads in the second divergent region. 8. A non-transitory, computer readable medium storing instructions that, when executed by a multiprocessing unit, cause the multiprocessing unit to synchronize a plurality of threads executing on the multiprocessing unit using a structured control flow, by performing the steps of: synchronizing threads arriving at a conditional statement that precedes a divergent region, wherein the conditional statement is a data-dependent conditional test and the divergent region is a section of code that is executed based thereon; generating a count representing a number of the threads that will enter the divergent region, wherein the generating the count is performed once the threads are synchronized and before any of the threads enter the divergent region; and within the divergent region, using the count to synchronize the threads in the divergent region. 9. The computer-readable medium of claim 8 , wherein the step of using the count comprises providing the count to a barrier within the divergent region and the divergent region barrier employing the count to synchronize the threads in the divergent region. 10. The computer-readable medium of claim 8 , wherein the plurality of threads is arranged in a plurality of groups of threads, and wherein the generating the count further comprises: for each group of threads, determining whether any thread in the group will enter the divergent region; generating the count based on the result of the determining. 11. The computer readable medium of claim 8 , wherein the instructions are generated by a compiler automatically when the compiler encounters a statement producing a divergent region of code that includes a synchronization operation. 12. The computer-readable medium of claim 8 , wherein the step of using the count to synchronize the threads further comprises using a barrier implemented as one of a counting semaphore, a bit field, and a network of logic gates, wherein the barrier is implemented in one of hardware and software, the barrier initialized using a value of the count. 13. The computer-readable medium of claim 8 , wherein the count is a first count, the divergent region is a first divergent region, and the first divergent region comprises a second divergent region, the steps further comprising: generating a second count representing a number of threads that will enter the second divergent region, the second count generated prior to any thread entering the second divergent region, wherein the second count is generated using the first count; and using the second count within the second divergent region to synchronize the threads in the second divergent region. 14. A computing device, comprising: a multiprocessing unit adapted to synchronize a plurality of threads executing on the multiprocessing unit using a structured control flow, the multiprocessing unit configured to: synchronize the plurality of threads that arrive at a conditional statement that precedes a divergent region, wherein the conditional statement is a data-dependent conditional test and the divergent region is a section of code that is executed based thereon; generate a count representing a number of the plurality of threads that will enter the divergent region, wherein the count is generated after the plurality of threads are synchronized and before any of the plurality of threads enter the divergent region; and use the count within the divergent region to synchronize the threads in the divergent region. 15. The computing device of claim 14 , wherein the multiprocessing unit is configured to use the count by providing the count to a barrier within the divergent region and the barrier using the count to synchronize the threads in the divergent region. 16. The computing device of claim 14 , wherein the plurality of threads is arranged in a plurality of groups of threads, and wherein generating a count further comprises: for each group of threads, determining whether any thread in the group will enter the divergent region; generating the count based on the result of the determining. 17. The computing device of claim 16 , wherein: the determining whether any thread in the group will enter the divergent region further comprises identifying one thread in the group that will enter the divergent region; and the generating the count based on the result of the determining further comprises generating the count based on the identified threads. 18. The computing device of claim 14 , wherein the using the count to synchronize the threads further comprises using a barrier implemented as one of a counting semaphore, a bit field, and a network of logic gates, wherein the barrier is implemented in one of hardware and software. 19. The computing device of claim 18 , wherein the using a barrier further comprises initializing the barrier using a value of the count. 20. The computing device of claim

Assignees

Inventors

Classifications

  • G06F9/522Primary

    Barrier synchronisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013290B2 cover?
A system and method are provided for synchronizing threads in a divergent region of code within a multi-threaded parallel processing system. The method includes, prior to any thread entering a divergent region, generating a count that represents a number of threads that will enter the divergent region. The method also includes using the count within the divergent region to synchronize the threa…
Who is the assignee on this patent?
Jones Stephen, Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/522. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).