Convergence among concurrently executing threads

US11847508B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11847508-B2
Application numberUS-202217819243-A
CountryUS
Kind codeB2
Filing dateAug 11, 2022
Priority dateSep 11, 2018
Publication dateDec 19, 2023
Grant dateDec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: one or more processors; and a memory comprising instructions that when executed by the one or more processors implement a code profiler configured to insert instructions into a code segment to configure a machine executing the code segment to: predict arrival of a thread executing the code segment at an execution barrier in the code segment; confirm that execution of the thread reached the execution barrier; cancel the predicted arrival of the thread at the execution barrier in a branch of the code segment that will not encounter the execution barrier. 2. The system of claim 1 , wherein the code profiler is configured to insert at least one instruction predicting an eventual arrival of the thread at the execution barrier, the at least one instruction comprising an instruction to join the execution barrier. 3. The system of claim 1 , the code profiler further configured to insert one or more instructions into the code segment to cause the thread to rejoin the execution barrier after execution of the thread resumes at an execution point subsequent to the execution barrier. 4. The system of claim 1 , the inserted instructions further configuring the machine to: resume execution of threads suspended at the execution barrier on condition that no more threads are predicted to arrive at the execution barrier. 5. The system of claim 1 , wherein the execution barrier is indicated by a code marker. 6. The system of claim 1 , the code profile further configured to set a range in the code segment for which the prediction of arrival of the thread applies. 7. The system of claim 1 , the code profiler further configured to insert one or more instructions into the code segment to: configure a threshold value; and configure the machine executing the instructions to resume execution of threads suspended at the execution barrier on condition that a number of the threads that have not yet arrived at the execution barrier satisfies the threshold value. 8. The system of claim 1 , the code profiler comprising: logic to detect a conflict in a live range of two or more execution barriers, wherein one of the two or more execution barriers is located at a post-dominator of two or more divergent branches of the code segment; and logic to eliminate the one of the two or more execution barriers located at the post-dominator in response to detecting the conflict. 9. The system of claim 1 , the code profiler comprising: logic to detect a conflict in a live range of two or more execution barriers, wherein one of the execution barriers is located at a post-dominator of two or more divergent branches of the code segment; and logic to insert one or more instructions into the code segment to cause the thread to cancel, upon reaching one of the two or more conflicting execution barriers, predicted arrival at other ones of the two or more conflicting execution barriers. 10. A method comprising: executing one or more instructions in a code segment to predict arrival of multiple threads executing the code segment at a first execution barrier in the code segment; suspending the execution of one or more of the threads that reach the first execution barrier while one or more of the threads that have not reached the first execution barrier are predicted to arrive at the first execution barrier; and executing a set of the one or more instructions, in a branch of the code segment that will not encounter the first execution barrier, canceling the predicted arrival at the first execution barrier of one or more of the threads that have not reached the first execution barrier. 11. The method of claim 10 , further comprising resuming the execution of the one or more threads suspended at the first execution barrier on condition that all of the threads predicted to arrive at the first execution barrier have arrived. 12. The method of claim 10 , further comprising resuming the execution of the one or more threads suspended at the first execution barrier on condition that a number of the one or more threads that remain predicted to arrive at the first execution barrier is less than a threshold level greater than zero. 13. The method of claim 10 , further comprising: marking one or more ranges in the code segment for which predictions of arrival of the threads at the first execution barrier are operable. 14. The method of claim 10 , further comprising: canceling arrival at the first execution barrier of threads that arrive at a second execution barrier that conflicts with the first execution barrier. 15. An apparatus comprising: one or more processors; a memory comprising instructions that when executed by the one or more processors result in the apparatus: executing one or more of the instructions in a code segment predicting arrival of a thread executing the code segment at an execution barrier in the code segment; suspending the execution of the thread at the execution barrier in a first execution branch of the code segment while one or more other threads are predicted to arrive at the execution barrier; and executing one or more of the instructions canceling, in branches of the code segment divergent from the first execution branch comprising the execution barrier, predictions that at least one of the other threads will arrive at the execution barrier. 16. The apparatus of claim 15 , the execution barrier located in a variable iteration count loop of the code segment. 17. The apparatus of claim 15 , the memory further comprising instructions that when executed by the one or more processors result in the apparatus: causing at least one of the multiple threads to rejoin the execution barrier at an execution point subsequent to the execution barrier. 18. The apparatus of claim 15 , further comprising logic configured to carry out deconfliction of convergence barriers for non-identical overlapping ranges of the code segment. 19. A method comprising: executing one or more instructions in a code segment: predicting that a plurality of threads will arrive at an execution convergence point; canceling a predicted arrival of a first one or more of the threads at the execution convergence point in an execution branch that will not execute to the execution convergence point; and delaying execution of at least one of the threads that reached the execution convergence point at the execution convergence point until a configured number of the threads that have not canceled the predicted arrival at the execution convergence point, arrive at the execution convergence point.

Assignees

Inventors

Classifications

  • Iterative single instructions for multiple data lanes [SIMD] · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • Divergence aspects · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • G06F9/522Primary

    Barrier synchronisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11847508B2 cover?
Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/522. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).