Memory transaction having implicit ordering effects
US-2015370500-A1 · Dec 24, 2015 · US
US9424099B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9424099-B2 |
| Application number | US-201213672291-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 8, 2012 |
| Priority date | Jun 1, 2012 |
| Publication date | Aug 23, 2016 |
| Grant date | Aug 23, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed methods, systems, and computer program products embodiments include synchronizing a group of workitems on a processor by storing a respective program counter associated with each of the workitems, selecting at least one first workitem from the group for execution, and executing the selected at least one first workitem on the processor. The selecting is based upon the respective stored program counter associated with the at least one first workitem.
Opening claim text (preview).
What is claimed is: 1. A method of synchronizing a group of a plurality of workitems on a processor, each of the plurality of workitems being associated with a program counter, the method comprising: determining a divergent control flow point, a synchronization control point, and a convergence point associated with at least one of the plurality of workitems; writing a value of the program counter of the determined divergent control flow point, synchronization point, and convergence point to a memory location associated with the at least one of the plurality of workitems; and selecting at least one first workitem from the plurality of workitems based upon a comparison of values of the program counter that have been written to memory, wherein the selecting includes executing divergent control flow in one or more of the workitems based at least on the comparison. 2. The method of claim 1 , further comprising: storing the respective program counter associated with each of the workitems. 3. The method of claim 2 , wherein the storing comprises: halting execution of at least one of the workitems upon reaching a convergence point or a synchronization point; and writing a value of a program counter of the halted at least one of the workitems to a memory location. 4. The method of claim 2 , wherein the storing comprises: storing the respective program counter only at one or more selected points in respective instruction streams. 5. The method of claim 4 , wherein the one or more selected points include only one or more of divergent control flow points, synchronization points, and convergence points. 6. The method of claim 1 , wherein the selecting at least one first workitem comprises: selecting the at least one first workitem from the group based upon the value of the stored program counter associated with the at least one first workitem; and executing the selected at least one workitem. 7. The method of claim 1 , further comprising: reaching a synchronization point in executing at least one second workitem from the group of workitems; determining, based upon values stored in one or more synchronization tracking registers, that a synchronization condition corresponding to the synchronization point is not satisfied; updating the one or more synchronization tracking registers to indicate reaching the synchronization point; and causing the at least one second workitem to wait upon the synchronization point, wherein the storing a respective program counter includes storing a program counter associated with the at least one second workitem. 8. The method of claim 7 , wherein the selecting further comprises: determining that the at least one first workitem is not currently waiting upon a synchronization point. 9. The method of claim 7 , wherein the predetermined characteristic is a lowest value relative to the other stored program counters. 10. The method of claim 7 , wherein the predetermined characteristic is a mode value. 11. The method of claim 1 , further comprising: reaching a convergence point in executing at least one second workitem from the group of workitems; determining, based upon values stored in one or more convergence tracking registers, that a convergence condition corresponding to the convergence point is not satisfied; updating the one or more convergence tracking registers to indicate reaching the convergence point; and causing the at least one second workitem to wait upon the convergence point, wherein the storing a respective program counter includes storing a program counter associated with the at least one second workitem. 12. The method of claim 1 , wherein the selecting comprises: comparing the stored program counter associated with the at least one first workitem to one or more other said stored program counters; and determining the stored program counter associated with the at least one first workitem as having a predetermined characteristic relative to the one or more other stored program counters. 13. The method of claim 1 , wherein the group of workitems is a workgroup and the workgroup is executing in a processing element of a single instruction multiple data (SIMD) processing unit. 14. The method of claim 1 , wherein the group includes workitems from two or more wavefronts of a workgroup executing in a SIMD processing unit. 15. The method of claim 1 , wherein control flow of one or more of the workitems include instructions included from a library function. 16. A system, comprising: a processor; a group of a plurality of workitems executing on the processor, each of the plurality of workitems being associated with a program counter; and a divergent flow synchronization module that, in response to being executed by the processor, is configured to cause the processor to: determine divergent control flow point, a synchronization control point, and a convergence point associated with at least one of the plurality of workitems, write a value of the program counter of the determined divergent control flow point, synchronization point, and convergence point to a memory location associated with the at least one of the plurality of workitems, and select at least one first workitem from the group to execute based upon a comparison of values of the program counter that have been written to memory, wherein the selecting includes executing divergent control flow in one or more of the workitems based at least on the comparison. 17. The system of claim 16 , wherein the processor is a vector processor. 18. The system of claim 16 , wherein the divergent flow synchronization module is further configured to select the at least one first workitem for execution by: comparing the program counter associated with the at least one first workitem to one or more other program counters that have been written to the memory; and determining the program counter associated with the at least one first workitem as having a predetermined characteristic relative to the one or more other respective program counters. 19. The system of claim 16 , wherein the divergent flow synchronization module is further configured to cause the processor to: store the respective program counter associated with each of the workitems. 20. The system of claim 16 , wherein the divergent flow synchronization module is further configured to cause the processor to: halt execution of at least one of the workitems upon reaching a convergence point or a synchronization point; and write a value of a program counter of the halted at least one of the workitems to a memory location. 21. The system of claim 16 , wherein the divergent flow synchronization module is further configured to cause the processor to: store the respective program counter only at one or more selected points in respective instruction streams. 22. The system of claim 16 , wherein the divergent flow synchronization module is further configured to cause the processor to: reach a synchronization point in executing at least one second workitem from the group of workitems; determine, based upon values stored in one or more synchronization tracking registers, that a synchronization condition corresponding to the synchronization point is not satisfied; update the one or more synchronization tracking registers to indicate reaching the synchronization point; and cause the at least one second workitem to wait upon the synchronization point, wherein the storing a respective program counter includes st
Related publications grouped by family.
Answers are generated from the same data shown on this page.