Data processing apparatus and method for performing vector processing
US-9672035-B2 · Jun 6, 2017 · US
US9830156B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9830156-B2 |
| Application number | US-201113209189-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 12, 2011 |
| Priority date | Aug 12, 2011 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of the present invention sets forth a technique for optimizing parallel thread execution in a temporal single-instruction multiple thread (SIMT) architecture. When the threads in a parallel thread group execute temporally on a common processing pipeline rather than spatially on parallel processing pipelines, execution cycles may be reduced when some threads in the parallel thread group are inactive due to divergence. Similarly, an instruction can be dispatched for execution by only one thread in the parallel thread group when the threads in the parallel thread group are executing a scalar instruction. Reducing the number of threads that execute an instruction removes unnecessary or redundant operations for execution by the processing pipelines. Information about scalar operands and operations and divergence of the threads is used in the instruction dispatch logic to eliminate unnecessary or redundant activity in the processing pipelines.
Opening claim text (preview).
The invention claimed is: 1. A method of executing an instruction for a thread group, the method comprising: receiving, by a single-instruction multiple-thread (SIMT) processor, the instruction for execution by the thread group comprising a plurality of threads, wherein the instruction includes one or more flags indicating that the instruction includes at least one of a scalar opcode and a scalar operand; evaluating the one or more flags included in the instruction to identify the instruction as a scalar instruction; and in response to identifying the instruction as a scalar instruction, dispatching, by the SIMT processor, the scalar instruction for execution by a portion of the threads in the thread group, wherein the portion of threads comprises at least one but not all threads in the thread group. 2. The method of claim 1 , wherein the evaluating includes identification of a source operand as a scalar operand. 3. The method of claim 1 , wherein the evaluating comprises identifying the instruction as a scalar instruction based on when an opcode included in the instruction is a scalar opcode. 4. The method of claim 1 , wherein source operands included in the instruction are scalar operands. 5. The method of claim 1 , wherein the evaluating comprises identifying the instruction as a scalar instruction based on operands included in the instruction. 6. The method of claim 1 , wherein the evaluating comprises identifying a source operand included in the instruction as a scalar operand that is read from one source operand register for all of the threads in the thread group. 7. The method of claim 1 , further comprising reading a source operand included in the instruction from a source operand register only for a first thread in the thread group that is active when a first flag included in the one or more flags indicates that the source operand is a scalar operand. 8. The method of claim 1 , wherein the portion of the threads in the thread group includes only threads in the thread group that are active based on divergence information. 9. The method of claim 1 , wherein the portion of threads comprises a single thread in the thread group. 10. The method of claim 1 , wherein: due to divergence, at least one thread in the thread group is inactive and at least one thread in the thread group is active; and the portion of threads comprises a single active thread in the thread group. 11. The method of claim 1 , wherein the evaluating comprises identifying the instruction as a scalar instruction based on at least one of a first determination that an operand included in the instruction is a scalar operand and a second determination that an identifier included in the instruction indicates that the instruction is scalar. 12. The method of claim 1 , further comprising: storing in one or more registers a result of the execution by the portion of the threads in the thread group; and accessing the result stored in the one or more registers by a second portion of the threads in the thread group, wherein the second portion of the threads in the thread group is not included in the first portion of the threads in the thread group. 13. The method of claim 1 , wherein: the divergence information for the thread group indicates at least one active thread and at least one inactive thread in the thread group; and the at least one active thread from the thread group is selected for executing the scalar instruction. 14. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to execute an instruction for a thread group, by performing the steps of: receiving the instruction for execution by the thread group comprising a plurality of threads, wherein the instruction includes one or more flags indicating that the instruction includes at least one of a scalar opcode and a scalar operand; evaluating the one or more flags included in the instruction to identify the instruction as a scalar instruction; and in response to identifying the instruction as a scalar instruction, dispatching the scalar instruction for execution by a portion of the threads in the thread group, wherein the portion of threads comprises at least one but not all threads in the thread group. 15. The non-transitory computer-readable storage medium of claim 14 , wherein evaluating comprises identifying the instruction as a scalar instruction when a first flag included in the one or more flags indicates that an opcode included in the instruction is a scalar opcode. 16. The non-transitory computer-readable storage medium of claim 14 , wherein evaluating comprises identifying the instruction as a scalar instruction when the one or more flags indicate that all destination operands included in the instruction are scalar operands. 17. A system for executing instructions, the system comprising: a memory that is configured to store instructions for execution by threads; and a single-instruction multiple-thread (SIMT) processor that is configured to: receive an instruction for execution by a thread group comprising a plurality of threads, wherein the instruction includes one or more flags indicating that the instruction includes at least one of a scalar opcode and a scalar operand; evaluate the one or more flags included in the instruction to identify the instruction as a scalar instruction; and in response to identifying the instruction as a scalar instruction, dispatch the scalar instruction for execution by a portion of the threads in the thread group, wherein the portion of threads comprises at least one but not all threads in the thread group. 18. The system of claim 17 , wherein the SIMT processor is further configured to identify a source operand as a scalar operand. 19. The system of claim 17 , wherein the SIMT processor is further configured to identify the instruction as a scalar instruction when an opcode included in the instruction is a scalar opcode. 20. The system of claim 17 , wherein all source operands included in the instruction are scalar operands. 21. The system of claim 17 , wherein the SIMT processor is further configured to identify the instruction as a scalar instruction based on operands included in the instruction. 22. The system of claim 17 , wherein the SIMT processor is further configured to identify a source operand included in the instruction as a scalar operand that is read from one source operand register for all of the threads in the thread group. 23. The system of claim 17 , wherein the SIMT processor is further configured to read a source operand included in the instruction from a source operand register only for a first thread in the thread group that is active when a first flag included in the one or more flags indicates that the source operand is a scalar operand. 24. The system of claim 17 , wherein the SIMT processor is further configured to write a destination operand register only once when a first flag included in the one or more flags indicates that the destination operand is a scalar operand. 25. The system of claim 17 , wherein the portion of the threads in the thread group includes only threads in the thread group that are active based on divergence information. 26. A method of executing an instruction across a thread group comprising a plurality of threads, the method comprising: receiving, by a processor, the instruction for execution across
Decoding the operand specifier, e.g. specifier format · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.