Multi-GPU frame rendering
US-10430915-B2 · Oct 1, 2019 · US
US11847508B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11847508-B2 |
| Application number | US-202217819243-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 11, 2022 |
| Priority date | Sep 11, 2018 |
| Publication date | Dec 19, 2023 |
| Grant date | Dec 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.
Opening claim text (preview).
What is claimed is: 1. A computer system comprising: one or more processors; and a memory comprising instructions that when executed by the one or more processors implement a code profiler configured to insert instructions into a code segment to configure a machine executing the code segment to: predict arrival of a thread executing the code segment at an execution barrier in the code segment; confirm that execution of the thread reached the execution barrier; cancel the predicted arrival of the thread at the execution barrier in a branch of the code segment that will not encounter the execution barrier. 2. The system of claim 1 , wherein the code profiler is configured to insert at least one instruction predicting an eventual arrival of the thread at the execution barrier, the at least one instruction comprising an instruction to join the execution barrier. 3. The system of claim 1 , the code profiler further configured to insert one or more instructions into the code segment to cause the thread to rejoin the execution barrier after execution of the thread resumes at an execution point subsequent to the execution barrier. 4. The system of claim 1 , the inserted instructions further configuring the machine to: resume execution of threads suspended at the execution barrier on condition that no more threads are predicted to arrive at the execution barrier. 5. The system of claim 1 , wherein the execution barrier is indicated by a code marker. 6. The system of claim 1 , the code profile further configured to set a range in the code segment for which the prediction of arrival of the thread applies. 7. The system of claim 1 , the code profiler further configured to insert one or more instructions into the code segment to: configure a threshold value; and configure the machine executing the instructions to resume execution of threads suspended at the execution barrier on condition that a number of the threads that have not yet arrived at the execution barrier satisfies the threshold value. 8. The system of claim 1 , the code profiler comprising: logic to detect a conflict in a live range of two or more execution barriers, wherein one of the two or more execution barriers is located at a post-dominator of two or more divergent branches of the code segment; and logic to eliminate the one of the two or more execution barriers located at the post-dominator in response to detecting the conflict. 9. The system of claim 1 , the code profiler comprising: logic to detect a conflict in a live range of two or more execution barriers, wherein one of the execution barriers is located at a post-dominator of two or more divergent branches of the code segment; and logic to insert one or more instructions into the code segment to cause the thread to cancel, upon reaching one of the two or more conflicting execution barriers, predicted arrival at other ones of the two or more conflicting execution barriers. 10. A method comprising: executing one or more instructions in a code segment to predict arrival of multiple threads executing the code segment at a first execution barrier in the code segment; suspending the execution of one or more of the threads that reach the first execution barrier while one or more of the threads that have not reached the first execution barrier are predicted to arrive at the first execution barrier; and executing a set of the one or more instructions, in a branch of the code segment that will not encounter the first execution barrier, canceling the predicted arrival at the first execution barrier of one or more of the threads that have not reached the first execution barrier. 11. The method of claim 10 , further comprising resuming the execution of the one or more threads suspended at the first execution barrier on condition that all of the threads predicted to arrive at the first execution barrier have arrived. 12. The method of claim 10 , further comprising resuming the execution of the one or more threads suspended at the first execution barrier on condition that a number of the one or more threads that remain predicted to arrive at the first execution barrier is less than a threshold level greater than zero. 13. The method of claim 10 , further comprising: marking one or more ranges in the code segment for which predictions of arrival of the threads at the first execution barrier are operable. 14. The method of claim 10 , further comprising: canceling arrival at the first execution barrier of threads that arrive at a second execution barrier that conflicts with the first execution barrier. 15. An apparatus comprising: one or more processors; a memory comprising instructions that when executed by the one or more processors result in the apparatus: executing one or more of the instructions in a code segment predicting arrival of a thread executing the code segment at an execution barrier in the code segment; suspending the execution of the thread at the execution barrier in a first execution branch of the code segment while one or more other threads are predicted to arrive at the execution barrier; and executing one or more of the instructions canceling, in branches of the code segment divergent from the first execution branch comprising the execution barrier, predictions that at least one of the other threads will arrive at the execution barrier. 16. The apparatus of claim 15 , the execution barrier located in a variable iteration count loop of the code segment. 17. The apparatus of claim 15 , the memory further comprising instructions that when executed by the one or more processors result in the apparatus: causing at least one of the multiple threads to rejoin the execution barrier at an execution point subsequent to the execution barrier. 18. The apparatus of claim 15 , further comprising logic configured to carry out deconfliction of convergence barriers for non-identical overlapping ranges of the code segment. 19. A method comprising: executing one or more instructions in a code segment: predicting that a plurality of threads will arrive at an execution convergence point; canceling a predicted arrival of a first one or more of the threads at the execution convergence point in an execution branch that will not execute to the execution convergence point; and delaying execution of at least one of the threads that reached the execution convergence point at the execution convergence point until a configured number of the threads that have not canceled the predicted arrival at the execution convergence point, arrive at the execution convergence point.
Iterative single instructions for multiple data lanes [SIMD] · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Divergence aspects · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Barrier synchronisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.