Method and system for resolving thread divergences

US9606808B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9606808-B2
Application numberUS-201213348544-A
CountryUS
Kind codeB2
Filing dateJan 11, 2012
Priority dateJan 11, 2012
Publication dateMar 28, 2017
Grant dateMar 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit causes the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available. The address divergence unit causes the subset of non-divergent threads to be issued for execution on the parallel processing unit. The address divergence unit repeats the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for resolving divergences between threads in a thread group executing on a parallel processing unit, the method comprising: identifying a subset of non-divergent threads included in the thread group; storing instructions related to the subset of non-divergent threads in a multi-issue queue; retrieving the instructions related to the subset of non-divergent threads from the multi-issue queue when the parallel processing unit is available; causing the subset of non-divergent threads to be issued for execution on the parallel processing unit; and repeating the identifying, storing, retrieving, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads. 2. The computer-implemented method of claim 1 , wherein identifying the subset of non-divergent threads comprises: issuing the threads included in the thread group to a register file coupled to one or more functional units configured to execute threads; analyzing the register file to determine that the threads included in the thread group are configured to access more than one memory location; identifying a first memory location that two or more of the threads included in the thread group are configured to access; and organizing the two or more threads into the subset of non-divergent threads. 3. The computer-implemented method of claim 2 , wherein the first memory location comprises an entry in the register file. 4. The computer-implemented method of claim 1 , wherein the remaining threads in the thread group comprises a subset of divergent threads, and further comprising: identifying a second subset of non-divergent threads included in the subset of divergent threads; causing the second subset of non-divergent threads to be issued for execution on the parallel processing unit; and repeating the identifying, storing, retrieving, and causing steps for the remaining threads in subset of divergent threads that are not included in the second subset of non-divergent threads. 5. The computer-implemented method of claim 4 , for each thread in the thread group, further comprising updating a thread mask associated with the thread group to reflect whether the thread belongs to the subset of non-divergent threads, to the second subset of non-divergent threads, or to the subset of divergent threads. 6. The computer-implemented method of claim 1 , further comprising repeating the identifying, storing, retrieving, and causing steps for all threads in the thread group not identified as belonging to a subset of non-divergent threads until H subsets of non-divergent threads are identified, H being an integer greater than one. 7. The computer-implemented method of claim 6 , further comprising causing each of the H subsets of non-divergent threads to be issued for execution on the parallel processing unit separately. 8. The computer-implemented method of claim 1 , wherein the subset of non-divergent threads comprises one or more threads accessing a common resource or a common aspect or portion of a resource within a computing device. 9. A non-transitory computer-readable medium storing program instructions that, when executed by a processing unit, cause the processing unit to resolve divergences between threads in a thread group executing on a parallel processing unit, by performing the steps of: identifying a subset of non-divergent threads included in the thread group; storing instructions related to the subset of non-divergent threads in a multi-issue queue; retrieving the instructions related to the subset of non-divergent threads from the multi-issue queue when the parallel processing unit is available; and causing the subset of non-divergent threads to be issued for execution on the parallel processing unit. 10. The non-transitory computer-readable medium of claim 9 , wherein identifying the subset of non-divergent threads comprises: issuing the threads included in the thread group to a register file coupled to one or more functional units configured to execute threads; analyzing the register file to determine that the threads included in the thread group are configured to access more than one memory location; identifying a first memory location that two or more of the threads included in the thread group are configured to access; and organizing the two or more threads into the subset of non-divergent threads. 11. The non-transitory computer-readable medium of claim 10 , wherein the first memory location comprises an entry in the register file. 12. The non-transitory computer-readable medium of claim 9 , wherein the remaining threads in the thread group comprises a subset of divergent threads, and further comprising: identifying a second subset of non-divergent threads included in the subset of divergent threads; and causing the second subset of non-divergent threads to be issued for execution on the parallel processing unit. 13. The non-transitory computer-readable medium of claim 12 , for each thread in the thread group, further comprising updating a thread mask associated with the thread group to reflect whether the thread belongs to the subset of non-divergent threads, to the second subset of non-divergent threads, or to the subset of divergent threads. 14. The non-transitory computer-readable medium of claim 9 , further comprising repeating the identifying, storing, retrieving, and causing steps until H subsets of non-divergent threads are identified, H being an integer greater than one. 15. The non-transitory computer-readable medium of claim 14 , further comprising the step of causing each of the H subsets of non-divergent threads to be issued for execution on the parallel processing unit separately. 16. The non-transitory computer-readable medium of claim 9 , wherein the subset of non-divergent threads comprises one or more threads accessing a common resource or a common aspect or portion of a resource within a computing device. 17. A computing device configured to resolve divergences between threads in a thread group executing on a parallel processing unit, comprising: a scheduler and instruction unit that includes: an address divergence unit configured to: identify a subset of non-divergent threads included in the thread group, store instructions related to the subset of non-divergent threads in a multi-issue queue, cause the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available, cause the subset of non-divergent threads to be issued for execution on the parallel processing unit, and repeat the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads. 18. The computing device of claim 17 , wherein the remaining threads in the thread group comprises a subset of divergent threads, and the address divergence unit is further configured to: identify a second subset of non-divergent threads included in the subset of divergent threads, cause the second subset of non-divergent threads to be issued for execution on the parallel processing unit, and transmit instructions related to all other threads included in the subset of divergent threads to the branch unit; and further comprising a branch unit configured to cause the identifying, storing, retrieving, and causing steps to be repeated for the remaining threads in subset of divergent threads that are not included in the

Assignees

Inventors

Classifications

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • G06F9/3887Primary

    controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • Divergence aspects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9606808B2 cover?
A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit…
Who is the assignee on this patent?
Choquette Jack, Qiu Xiaogang, Tuckey Jeff, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).