Memory performance when speculation control is enabled, and instruction therefor
US-2015378915-A1 · Dec 31, 2015 · US
US9448803B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9448803-B2 |
| Application number | US-201313794578-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 11, 2013 |
| Priority date | Mar 11, 2013 |
| Publication date | Sep 20, 2016 |
| Grant date | Sep 20, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.
Opening claim text (preview).
What is claimed is: 1. A method comprising: initiating execution of a plurality of threads to process instructions of a program that includes a barrier instruction; for each thread in the plurality of threads, determining whether the thread participates in the barrier instruction when the thread reaches the barrier instruction during execution of the thread; prior to executing each of the threads that participate in the barrier instruction, waiting for a portion of the threads that participate in the barrier instruction to reach the barrier instruction and for conditions that allow the barrier instruction to be scheduled as an first barrier to be met; and serially executing each of the threads that participate in the barrier instruction to process one or more instructions of the program that follow the barrier instruction, wherein during execution of each of the threads that participate in the barrier instruction an additional thread is identified as a late arriving thread that participates in the barrier instruction and that is not included in the portion of the threads and the additional thread is executed to process the one or more instructions of the program that follow the barrier instruction. 2. The method of claim 1 , further comprising, prior to executing each of the threads that participate in the harrier instruction, waiting for a last arriving thread that participates in the barrier instruction, wherein the last arriving thread reaches the barrier instruction last relative to other threads in the plurality of threads that participate in the barrier instruction. 3. The method of claim 1 , further comprising, after executing each of the threads in the portion of the threads that participate in the barrier instruction, invalidating a barrier identifier that corresponds to the barrier instruction. 4. The method of claim 1 , wherein the conditions that allow the barrier instruction to be scheduled as the first barrier comprise the portion of the threads that participate in the barrier instruction including a number of threads that is greater than a minimum number of threads. 5. The method of claim 1 , wherein the conditions that allow the barrier instruction to be scheduled as the first barrier comprise a timeout expiring. 6. The method of claim 1 , wherein the barrier instruction delineates a code section. 7. The method of claim 6 , further comprising, waiting for a first thread to execute all instructions within the code section before selecting another thread to execute the code section. 8. The method of claim 1 , further comprising associating the threads with logical identifiers that are mapped to physical identifiers, wherein the physical identifiers are referenced by a multi-threaded processing core during execution of the threads. 9. The method of claim 8 , wherein executing each of the threads that participate in the barrier instruction comprises selecting the threads for serial execution based on the logical identifiers. 10. The method of claim 1 , further comprising appending a tag to a barrier identifier corresponding to the barrier instruction to uniquely identify each occurrence of the barrier instruction when the barrier identifier is used in multiple places in the program. 11. A processing subsystem comprising: an instruction scheduling unit, configured to: receive instructions of a program for execution by a plurality of threads wherein the program includes a barrier instruction; when each thread reaches the barrier instruction during execution of the thread, determine whether the thread participates in the barrier instruction; and wait for a portion of the threads that participate in the barrier instruction to reach the barrier instruction, wherein the portion comprises a number of threads that is greater than or equal to a minimum number of threads; after waiting for the portion of the threads, enable the threads that participate in the barrier instruction to be selected for execution; and a multi-threaded processing core that is configured to serially execute each of the threads that participate in the barrier instruction to process one or more instructions of the program that follow the barrier instruction, wherein the instruction scheduling unit is further configured to, during the execution of the enabled threads that participate in the barrier instruction identify an additional thread as a late arriving thread that participates in the barrier instruction and enable the additional thread to be selected for execution. 12. The processing subsystem of claim 11 , wherein the instruction scheduling unit is further configured to, prior enabling each of the threads that participate in the barrier instruction, wait for a last arriving thread that participates in the barrier instruction, wherein the last arriving thread reaches the barrier instruction last relative to other threads in the plurality of threads that participate in the barrier instruction. 13. The processing subsystem of claim 11 , wherein the instruction scheduling unit is further configured to, after executing each thread in the portion of the threads that participate in the barrier instruction, invalidate a barrier identifier that corresponds to the barrier instruction. 14. The processing subsystem of claim 11 , wherein the barrier instruction delineates a code section. 15. The processing subsystem of claim 14 , wherein the instruction scheduling unit is further configured to, wait for a first thread to execute all instructions within the code section before selecting another thread to execute the code section. 16. The processing subsystem of claim 11 , wherein the threads are associated with logical identifiers that are mapped to physical identifiers, and wherein the physical identifiers are referenced by the multi-threaded processing core during execution of the threads that participate in the barrier instruction. 17. The processing subsystem of claim 16 , wherein the instruction scheduling unit is further configured to select the threads that participate in the barrier instruction for serial execution based on the logical identifiers.
Barrier synchronisation · CPC title
to perform conditional operations, e.g. using predicates or guards · CPC title
Synchronisation or serialisation instructions · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Maintaining memory consistency · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.