System and method for hardware scheduling of conditional barriers and impatient barriers

US9448803B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9448803-B2
Application numberUS-201313794578-A
CountryUS
Kind codeB2
Filing dateMar 11, 2013
Priority dateMar 11, 2013
Publication dateSep 20, 2016
Grant dateSep 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: initiating execution of a plurality of threads to process instructions of a program that includes a barrier instruction; for each thread in the plurality of threads, determining whether the thread participates in the barrier instruction when the thread reaches the barrier instruction during execution of the thread; prior to executing each of the threads that participate in the barrier instruction, waiting for a portion of the threads that participate in the barrier instruction to reach the barrier instruction and for conditions that allow the barrier instruction to be scheduled as an first barrier to be met; and serially executing each of the threads that participate in the barrier instruction to process one or more instructions of the program that follow the barrier instruction, wherein during execution of each of the threads that participate in the barrier instruction an additional thread is identified as a late arriving thread that participates in the barrier instruction and that is not included in the portion of the threads and the additional thread is executed to process the one or more instructions of the program that follow the barrier instruction. 2. The method of claim 1 , further comprising, prior to executing each of the threads that participate in the harrier instruction, waiting for a last arriving thread that participates in the barrier instruction, wherein the last arriving thread reaches the barrier instruction last relative to other threads in the plurality of threads that participate in the barrier instruction. 3. The method of claim 1 , further comprising, after executing each of the threads in the portion of the threads that participate in the barrier instruction, invalidating a barrier identifier that corresponds to the barrier instruction. 4. The method of claim 1 , wherein the conditions that allow the barrier instruction to be scheduled as the first barrier comprise the portion of the threads that participate in the barrier instruction including a number of threads that is greater than a minimum number of threads. 5. The method of claim 1 , wherein the conditions that allow the barrier instruction to be scheduled as the first barrier comprise a timeout expiring. 6. The method of claim 1 , wherein the barrier instruction delineates a code section. 7. The method of claim 6 , further comprising, waiting for a first thread to execute all instructions within the code section before selecting another thread to execute the code section. 8. The method of claim 1 , further comprising associating the threads with logical identifiers that are mapped to physical identifiers, wherein the physical identifiers are referenced by a multi-threaded processing core during execution of the threads. 9. The method of claim 8 , wherein executing each of the threads that participate in the barrier instruction comprises selecting the threads for serial execution based on the logical identifiers. 10. The method of claim 1 , further comprising appending a tag to a barrier identifier corresponding to the barrier instruction to uniquely identify each occurrence of the barrier instruction when the barrier identifier is used in multiple places in the program. 11. A processing subsystem comprising: an instruction scheduling unit, configured to: receive instructions of a program for execution by a plurality of threads wherein the program includes a barrier instruction; when each thread reaches the barrier instruction during execution of the thread, determine whether the thread participates in the barrier instruction; and wait for a portion of the threads that participate in the barrier instruction to reach the barrier instruction, wherein the portion comprises a number of threads that is greater than or equal to a minimum number of threads; after waiting for the portion of the threads, enable the threads that participate in the barrier instruction to be selected for execution; and a multi-threaded processing core that is configured to serially execute each of the threads that participate in the barrier instruction to process one or more instructions of the program that follow the barrier instruction, wherein the instruction scheduling unit is further configured to, during the execution of the enabled threads that participate in the barrier instruction identify an additional thread as a late arriving thread that participates in the barrier instruction and enable the additional thread to be selected for execution. 12. The processing subsystem of claim 11 , wherein the instruction scheduling unit is further configured to, prior enabling each of the threads that participate in the barrier instruction, wait for a last arriving thread that participates in the barrier instruction, wherein the last arriving thread reaches the barrier instruction last relative to other threads in the plurality of threads that participate in the barrier instruction. 13. The processing subsystem of claim 11 , wherein the instruction scheduling unit is further configured to, after executing each thread in the portion of the threads that participate in the barrier instruction, invalidate a barrier identifier that corresponds to the barrier instruction. 14. The processing subsystem of claim 11 , wherein the barrier instruction delineates a code section. 15. The processing subsystem of claim 14 , wherein the instruction scheduling unit is further configured to, wait for a first thread to execute all instructions within the code section before selecting another thread to execute the code section. 16. The processing subsystem of claim 11 , wherein the threads are associated with logical identifiers that are mapped to physical identifiers, and wherein the physical identifiers are referenced by the multi-threaded processing core during execution of the threads that participate in the barrier instruction. 17. The processing subsystem of claim 16 , wherein the instruction scheduling unit is further configured to select the threads that participate in the barrier instruction for serial execution based on the logical identifiers.

Assignees

Inventors

Classifications

  • Barrier synchronisation · CPC title

  • to perform conditional operations, e.g. using predicates or guards · CPC title

  • Synchronisation or serialisation instructions · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Maintaining memory consistency · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9448803B2 cover?
A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30087. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).