Parallel Processing Of Data
US-2024338235-A1 · Oct 10, 2024 · US
US9639365B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9639365-B2 |
| Application number | US-201213674890-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 12, 2012 |
| Priority date | Mar 24, 2008 |
| Publication date | May 2, 2017 |
| Grant date | May 2, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
Opening claim text (preview).
The invention claimed is: 1. A system for executing indirect function calls for synchronous parallel processing threads, the system comprising: an execution stack that stores thread state information for a number of threads that are concurrently executed by the system; a controller that is coupled to the execution stack and that: receives program instructions including control instructions; executes the control instructions by pushing and popping the thread state information; maintains an active mask that indicates active threads in a thread group that should be processed in parallel; and serializes execution of a plurality of indirect function calls for each unique pointer within a set of pointers that corresponds to any of the active threads, comprising modifying the active mask based on one or more of the unique pointers within the set of pointers; and multiple processing engines that receive the program instructions and execute each program instruction in parallel for the threads in the thread group that should be processed in parallel according to the active mask. 2. The system of claim 1 , wherein the controller further: receives a first control instruction that references the set of pointers to one or more functions in a program, each pointer in the set of pointers specifying an address of a corresponding function in the one or more functions; determines if two pointers in the set of pointers corresponding to active threads in the thread group are different, indicating that the active threads diverge during execution of the indirect function calls; pushes a first token onto the execution stack when the active threads diverge, the token including an address associated with the first control instruction; and updates an active program counter to specify an address of a first function of the one or more functions. 3. The system of claim 2 , wherein the controller further: receives, prior to the first control instruction, a second control instruction that specifies a target address of an instruction to be executed after the indirect function calls are executed; pushes a second token onto the execution stack prior to the pushing of the first token; and updates the active program counter to specify an instruction in the program that is immediately after the second control instruction. 4. The system of claim 1 , wherein the controller further includes a token type, a target address, and a mask in the thread state information that is pushed onto the execution stack when a branch instruction is executed and one or more active threads in the thread group diverge, the mask indicating any threads in the thread group that should be processed in parallel when the thread state information is popped from the execution stack. 5. The system of claim 1 , wherein the controller further modifies the active mask to disable processing of any of the active threads in the thread group that have a pointer that is different than a pointer corresponding to a first indirect function call included in the plurality of indirect function calls. 6. The system of claim 5 , wherein the controller further: modifies the active mask to disable processing of any of the threads in the thread group that execute a function call of the indirect function calls that is different than the first indirect function call; and executes the first indirect function call. 7. The system of claim 6 , wherein the controller further: receives a second control instruction in the program; determines that the second control instruction is a return instruction; pops the first token from the execution stack; sets the active mask to the mask from the first token; and sets the active program counter to the address of the first control instruction from the first token. 8. The system of claim 7 , wherein the controller further: receives the first control instruction; determines that the pointers corresponding to threads in the thread group that are active according to the active mask are not different, indicating that the threads do not diverge during execution of the indirect function calls; updates the active program counter to specify an address of a second function of the one or more functions; executes the second function. 9. The system of claim 1 , wherein an operand of an indirect branch control instruction specifies a register for each thread of the thread group that stores indices corresponding to one or more entries in a table that stores the set of pointers. 10. The system of claim 1 , wherein an operand of an indirect branch control instruction specifies a register in each thread of the thread group that stores the set of pointers. 11. The system of claim 1 , wherein the controller serializes execution of the plurality of indirect function calls by executing an indirect branch instruction having an operand that specifies, for each thread of the thread group, a register from which a pointer within the set of pointers can be determined. 12. The system of claim 1 , wherein each unique pointer within the set of pointers specifies a different subroutine, each subroutine comprising a set of instructions for execution by a corresponding thread of the thread group. 13. A method for executing indirect function calls for synchronous parallel processing threads, the method comprising: receiving program instructions including control instructions; executing the control instructions by pushing one or more tokens storing the thread state information onto an execution stack and subsequently popping the one or more tokens from the execution stack; maintaining an active mask that indicates active threads in a thread group that should be processed in parallel; and serializing execution of a plurality of indirect function calls for each unique pointer within a set of pointers that corresponds to any of the active threads, comprising modifying the active mask based on one or more of the unique pointers within the set of pointers. 14. The method of claim 13 , further comprising: receiving a first control instruction that references the set of pointers to one or more functions in a program, each pointer in the set of pointers specifying an address of a corresponding function in the one or more functions; determining if two pointers in the set of pointers corresponding to active threads in the thread group are different, indicating that the active threads diverge during execution of the indirect function calls; pushing a first token onto the execution stack when the active threads diverge, the token including an address associated with the first control instruction; and updating an active program counter to specify an address of a first function of the one or more functions. 15. The method of claim 14 , further comprising: receiving, prior to the first control instruction, a second control instruction that specifies a target address of an instruction to be executed after the indirect function calls are executed; pushing a second token onto the execution stack prior to the pushing of the first token; and updating the active program counter to specify an instruction in the program that is immediately after the second control instruction. 16. The method of claim 13 , further comprising including a token type, a target address, and a mask in the thread state information that is pushed onto the execution stack when a branch instruction is executed and one or more active threads in the thread group diverge, the mask indicating any threads in the thread group that should be processed in parallel when the thread state informa
Special purpose registers · CPC title
using a plurality of independent parallel functional units · CPC title
Concurrent instruction execution, e.g. pipeline or look ahead · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Unconditional branch instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.