Parallel Processing Of Data
US-2024338235-A1 · Oct 10, 2024 · US
US9606808B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9606808-B2 |
| Application number | US-201213348544-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 11, 2012 |
| Priority date | Jan 11, 2012 |
| Publication date | Mar 28, 2017 |
| Grant date | Mar 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computing device detects divergences between threads in a thread group executing on a parallel processing unit. The computing device includes an address divergence unit that identifies a subset of non-divergent threads included in the thread group. The address divergence unit stores instructions related to the subset of non-divergent threads in a multi-issue queue. The address divergence unit causes the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available. The address divergence unit causes the subset of non-divergent threads to be issued for execution on the parallel processing unit. The address divergence unit repeats the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for resolving divergences between threads in a thread group executing on a parallel processing unit, the method comprising: identifying a subset of non-divergent threads included in the thread group; storing instructions related to the subset of non-divergent threads in a multi-issue queue; retrieving the instructions related to the subset of non-divergent threads from the multi-issue queue when the parallel processing unit is available; causing the subset of non-divergent threads to be issued for execution on the parallel processing unit; and repeating the identifying, storing, retrieving, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads. 2. The computer-implemented method of claim 1 , wherein identifying the subset of non-divergent threads comprises: issuing the threads included in the thread group to a register file coupled to one or more functional units configured to execute threads; analyzing the register file to determine that the threads included in the thread group are configured to access more than one memory location; identifying a first memory location that two or more of the threads included in the thread group are configured to access; and organizing the two or more threads into the subset of non-divergent threads. 3. The computer-implemented method of claim 2 , wherein the first memory location comprises an entry in the register file. 4. The computer-implemented method of claim 1 , wherein the remaining threads in the thread group comprises a subset of divergent threads, and further comprising: identifying a second subset of non-divergent threads included in the subset of divergent threads; causing the second subset of non-divergent threads to be issued for execution on the parallel processing unit; and repeating the identifying, storing, retrieving, and causing steps for the remaining threads in subset of divergent threads that are not included in the second subset of non-divergent threads. 5. The computer-implemented method of claim 4 , for each thread in the thread group, further comprising updating a thread mask associated with the thread group to reflect whether the thread belongs to the subset of non-divergent threads, to the second subset of non-divergent threads, or to the subset of divergent threads. 6. The computer-implemented method of claim 1 , further comprising repeating the identifying, storing, retrieving, and causing steps for all threads in the thread group not identified as belonging to a subset of non-divergent threads until H subsets of non-divergent threads are identified, H being an integer greater than one. 7. The computer-implemented method of claim 6 , further comprising causing each of the H subsets of non-divergent threads to be issued for execution on the parallel processing unit separately. 8. The computer-implemented method of claim 1 , wherein the subset of non-divergent threads comprises one or more threads accessing a common resource or a common aspect or portion of a resource within a computing device. 9. A non-transitory computer-readable medium storing program instructions that, when executed by a processing unit, cause the processing unit to resolve divergences between threads in a thread group executing on a parallel processing unit, by performing the steps of: identifying a subset of non-divergent threads included in the thread group; storing instructions related to the subset of non-divergent threads in a multi-issue queue; retrieving the instructions related to the subset of non-divergent threads from the multi-issue queue when the parallel processing unit is available; and causing the subset of non-divergent threads to be issued for execution on the parallel processing unit. 10. The non-transitory computer-readable medium of claim 9 , wherein identifying the subset of non-divergent threads comprises: issuing the threads included in the thread group to a register file coupled to one or more functional units configured to execute threads; analyzing the register file to determine that the threads included in the thread group are configured to access more than one memory location; identifying a first memory location that two or more of the threads included in the thread group are configured to access; and organizing the two or more threads into the subset of non-divergent threads. 11. The non-transitory computer-readable medium of claim 10 , wherein the first memory location comprises an entry in the register file. 12. The non-transitory computer-readable medium of claim 9 , wherein the remaining threads in the thread group comprises a subset of divergent threads, and further comprising: identifying a second subset of non-divergent threads included in the subset of divergent threads; and causing the second subset of non-divergent threads to be issued for execution on the parallel processing unit. 13. The non-transitory computer-readable medium of claim 12 , for each thread in the thread group, further comprising updating a thread mask associated with the thread group to reflect whether the thread belongs to the subset of non-divergent threads, to the second subset of non-divergent threads, or to the subset of divergent threads. 14. The non-transitory computer-readable medium of claim 9 , further comprising repeating the identifying, storing, retrieving, and causing steps until H subsets of non-divergent threads are identified, H being an integer greater than one. 15. The non-transitory computer-readable medium of claim 14 , further comprising the step of causing each of the H subsets of non-divergent threads to be issued for execution on the parallel processing unit separately. 16. The non-transitory computer-readable medium of claim 9 , wherein the subset of non-divergent threads comprises one or more threads accessing a common resource or a common aspect or portion of a resource within a computing device. 17. A computing device configured to resolve divergences between threads in a thread group executing on a parallel processing unit, comprising: a scheduler and instruction unit that includes: an address divergence unit configured to: identify a subset of non-divergent threads included in the thread group, store instructions related to the subset of non-divergent threads in a multi-issue queue, cause the instructions related to the subset of non-divergent threads to be retrieved from the multi-issue queue when the parallel processing unit is available, cause the subset of non-divergent threads to be issued for execution on the parallel processing unit, and repeat the identifying, storing, and causing steps for the remaining threads in the thread group that are not included in the subset of non-divergent threads. 18. The computing device of claim 17 , wherein the remaining threads in the thread group comprises a subset of divergent threads, and the address divergence unit is further configured to: identify a second subset of non-divergent threads included in the subset of divergent threads, cause the second subset of non-divergent threads to be issued for execution on the parallel processing unit, and transmit instructions related to all other threads included in the subset of divergent threads to the branch unit; and further comprising a branch unit configured to cause the identifying, storing, retrieving, and causing steps to be repeated for the remaining threads in subset of divergent threads that are not included in the
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Divergence aspects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.