Memory transaction having implicit ordering effects
US-2015370500-A1 · Dec 24, 2015 · US
US9652300B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9652300-B2 |
| Application number | US-201213535976-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2012 |
| Priority date | Jun 28, 2012 |
| Publication date | May 16, 2017 |
| Grant date | May 16, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for the processing of EU threads (also known as warps) in a thread group. The status of each EU thread in the group may be monitored, to determine if it is executing or if it is halted and waiting at a synchronization barrier. If certain threshold conditions are met, the waiting EU threads may be preempted to allow execution of threads from another thread group. The threshold conditions may include a minimum number of EUs in use, a minimum number of EU threads in the first thread group that are waiting at the synchronization barrier and/or a maximum number of EU threads that are still executing, and a minimum wait time for one or more of the EU threads waiting at the barrier.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: processing a first thread group over a set of single instruction multiple data (SIMD) execution units of a SIMD processor; idling threads of the first thread group as the respective threads reach a synchronization barrier; and selectively processing a second thread group on a first subset of the set of SIMD execution units for which threads of the first group are idled while one or more non-idle threads of the first thread group continue executing on a second subset of the set of SIMD execution units, based on a threshold related to one or more of demand for SIMD execution units of the SIMD processor and availability of the SIMD execution units of the SIMD processor; wherein the selectively processing includes, setting the threshold dynamically based on a context of the first thread group, and processing the second thread group on the first subset of the set of SIMD execution units if a difference between a number of SIMD execution units of the SIMD processor and a number of threads of the first thread group exceeds a minimum threshold difference. 2. The method of claim 1 , wherein the selectively processing further includes: processing the second thread group on the first subset of the set of SIMD execution units if a number of threads of the first thread group exceeds a minimum threshold. 3. The method of claim 1 , wherein the selectively processing further includes: processing the second thread group on the first subset of the subset of SIMD execution units if a number of non-idle threads of the first thread group exceeds a minimum threshold. 4. The method of claim 1 , wherein the selectively processing further includes: processing the second thread group on the first subset of the set of SIMD execution units if a duration for which one or more threads of the first thread group is idle exceeds a minimum threshold duration. 5. The method of claim 4 , wherein the selectively processing further includes: processing the second thread group on the first subset of the set of SIMD execution units if an average duration for which multiple threads of the first thread group are idle exceeds the minimum threshold duration. 6. The method of claim 1 , wherein the selectively processing further includes one or more of: setting the threshold based further on user input; and setting the threshold with scheduling logic. 7. An apparatus, comprising, a single instruction multiple data (SIMD) processor and a thread manager, wherein the thread manager is configured to: assign a first thread group over a set of SIMD execution units of the SIMD processor; idle threads of the first thread group as the respective threads reach a synchronization barrier; and selectively assign a second thread group to a first subset of the set of SIMD execution units at which threads of the first group are idled while one or more non-idle threads of the first thread group continue executing on a second subset of the set of SIMD execution units, based on a threshold related to one or more of demand for SIMD execution units of the SIMD processor and availability of the SIMD execution units of the SIMD processor, including to, set the threshold dynamically based on a context of the first thread group; and process the second thread group on the first subset of the set of SIMD execution units if a difference between a number of SIMD execution units of the SIMD processor and a number of threads of the first thread group exceeds a minimum threshold difference. 8. The apparatus of claim 7 , wherein the thread manager is further configured to: assign the second thread group to the first subset of the set of SIMD execution units if a number of threads of the first thread group exceeds a minimum threshold. 9. The apparatus of claim 7 , wherein the thread manager is further configured to: assign the second thread group to the first subset of the set of SIMD execution units if a number of non-idle threads of the first thread group exceeds a minimum threshold. 10. The apparatus of claim 7 , wherein the thread manager is further configured to: assign the second thread group to the first subset of the set of SIMD execution units if a duration for which one or more threads of the first thread group is idle exceeds a minimum threshold duration. 11. The apparatus of claim 10 , wherein the thread manager is further configured to: assign the second thread group to the first subset of the set of SIMD execution units if an average duration for which multiple threads of the first thread group are idled exceeds the minimum threshold duration. 12. The apparatus of claim 7 , wherein the thread manager is further configured to perform one or more of: set the threshold based further on user input; and set the threshold with scheduling logic. 13. The apparatus of claim 7 , wherein the thread manager is further configured to determine when threads of the first thread group reach the synchronization barrier based on status information provided by the respective threads. 14. A non-transitory computer readable medium encoded with a computer program that includes instructions to cause a single instruction multiple data (SIMD) processor to: process a first thread group over a set of SIMD execution units of the SIMD processor; idle threads of the first thread group as the respective reach a synchronization barrier; and selectively process a second thread group on a first subset of the set of SIMD execution units for which threads of the first group are idled while one or more non-idle threads of the first thread group continue executing on a second subset of the set of SIMD execution units, based on a threshold related to one or more of demand for SIMD execution units of the SIMD processor and availability of the SIMD execution units of the SIMD processor, including to, set the threshold dynamically based on a context of the first thread group, and process the second thread group on the first subset of the set of SIMD execution units if a difference between a number of SIMD execution units of the SIMD processor and a number of threads of the first thread group exceeds a minimum threshold difference. 15. The non-transitory computer readable medium of claim 14 , further including instructions to cause the SIMD processor to: process the second thread group on the first subset of the set of SIMD execution units a number of threads of the first thread group exceeds a minimum threshold. 16. The non-transitory computer readable medium of claim 14 , further including instructions to cause the SIMD processor to: process the second thread group on the first subset of the set of SIMD execution units if a number of non-idle threads of the first thread group exceeds a minimum threshold. 17. The non-transitory computer readable medium of claim 14 , further including instructions to cause the SIMD processor to: process the second thread group on the first subset of the set of SIMD execution units if a duration for which one or more threads of the first thread group is idled exceeds a minimum threshold duration. 18. The non-transitory computer readable medium of claim 17 , further including instructions to cause the SIMD processor to: process the second thread group on the first subset of the set of SIMD execution units if an average duration for which multiple threads of the first thread group are idle exceeds the minimum threshold duration. 19. The non-transitory computer readable medium of claim 14 , further including instructi
Barrier synchronisation · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
considering the load · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.