Parallel Processing Of Data
US-2024338235-A1 · Oct 10, 2024 · US
US9798548B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9798548-B2 |
| Application number | US-201113333879-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 21, 2011 |
| Priority date | Dec 21, 2011 |
| Publication date | Oct 24, 2017 |
| Grant date | Oct 24, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for scheduling instructions within a parallel computing machine, the method comprising: fetching instructions corresponding to two or more thread groups from an instruction cache unit; receiving pre-decode data encoded in each one of the instructions, wherein the pre-decode data is determined when the instructions are compiled; partially decoding a first instruction to decode only the pre-decode data in the first instruction; selecting, at runtime, the first instruction to issue for execution by a parallel processing unit based at least in part on the pre-decode data, the pre-decode data comprising information utilized for scheduling of the execution of the first instruction relative to execution of the other instructions; completing the decoding of the first instruction; and dispatching the first instruction to the parallel processing unit for execution. 2. The method of claim 1 , wherein the pre-decode data encodes a wait scheduling hint comprising a number of scheduling cycles that transpire before the first instruction is issued for execution. 3. The method of claim 2 , wherein the wait scheduling hint specifies a scheduling priority option that changes the scheduling priority for a first thread group of the two or more thread groups that is associated with the first instruction. 4. The method of claim 1 , wherein the pre-decode data specifies that a default scheduling hint is used to schedule the first instruction. 5. The method of claim 1 , wherein the pre-decode data encodes a hold scheduling hint that configures a scheduling unit to select the first instruction to issue over an earlier issued instruction that failed to execute and is a reissue instruction available to be issued. 6. The method of claim 1 , wherein the pre-decode data encodes a hold scheduling hint that configures a scheduling unit to select to issue, over the first instruction, an earlier issued instruction that failed to execute and is a reissue instruction available to be issued. 7. The method of claim 1 , wherein the pre-decode data encodes a pair scheduling hint that configures a scheduling unit to select to issue the first instruction and a second instruction in a single scheduling cycle, and wherein the first instruction and the second instruction are associated with a first thread group of the two or more thread groups. 8. A scheduling unit, comprising: an instruction cache fetch unit that is configured to route instructions corresponding to two or more thread groups to a first buffer and route pre-decode data associated with each one of the instructions to a second buffer; a macro-scheduler unit that is coupled to the instruction cache fetch unit and configured to receive pre-decode data, wherein the pre-decode data is determined when the instructions are compiled; a micro-scheduler arbiter that is coupled to the macro-scheduler unit and the second buffer and configured to select, at runtime, a first instruction for execution by a processing unit based at least in part on the pre-decode data, the pre-decode data comprising information utilized for scheduling the execution of the first instruction relative to execution of the other instructions; a decode unit coupled to the first buffer and configured to decode the first instruction by partially decoding the first instruction to decode only the pre-decode data in the first instruction, and subsequently completing the decoding of the first instruction; and a dispatch unit coupled to the decode unit and configured to dispatch the first instruction to a processing unit for execution. 9. The scheduling unit of claim 8 , wherein the pre-decode data encodes a wait scheduling hint comprising a number of scheduling cycles that transpire before the first instruction is issued for execution. 10. The scheduling unit of claim 9 , wherein the wait scheduling hint specifies a scheduling priority option that changes the scheduling priority for a first thread group of the two or more thread groups that is associated with the first instruction. 11. The scheduling unit of claim 8 , wherein the pre-decode data specifies that a default scheduling hint is used to schedule the first instruction. 12. The scheduling unit of claim 8 , wherein the pre-decode data encodes a hold scheduling hint that configures a scheduling unit to select the first instruction to issue over an earlier issued instruction that failed to execute and is a reissue instruction available to be issued. 13. The scheduling unit of claim 8 , wherein the pre-decode data encodes a hold scheduling hint that configures a scheduling unit to select to issue, over the first instruction, an earlier issued instruction that failed to execute and is a reissue instruction available to be issued. 14. A computing device comprising: a parallel processing unit that includes a scheduling unit configured to: fetch instructions corresponding to two or more thread groups from an instruction cache unit; receive pre-decode data encoded in each one of the instructions, where the pre-decode data is determined when the instructions are compiled; partially decode a first instruction to decode only the pre-decode data in the first instruction; select, at runtime, the first instruction for execution by a processing unit based at least in part on the pre-decode data, the pre-decode data comprising information utilized for scheduling of the execution of the first instruction relative to execution of the other instructions; complete the decoding of the first instruction; and dispatch the instruction to the parallel processing unit for execution. 15. The computing device of claim 14 , wherein the pre-decode data encodes a wait scheduling hint comprising a number of scheduling cycles that transpire before the first instruction is issued for execution. 16. The computing device of claim 15 , wherein the wait scheduling hint specifies a scheduling priority option that changes the scheduling priority for a first thread group of the two or more thread groups that are associated with the first instruction. 17. The computing device of claim 14 , wherein the pre-decode data specifies that a default scheduling hint is used to schedule the first instruction. 18. The computing device of claim 14 , wherein the pre-decode data encodes a hold scheduling hint that configures a scheduling unit to select the first instruction to issue over an earlier issued instruction that failed to execute and is a reissue instruction available to be issued. 19. The computing device of claim 14 , wherein the pre-decode data encodes a hold scheduling hint that configures a scheduling unit to select to issue, over the first instruction, an earlier issued instruction that failed to execute and is a reissue instruction available to be issued. 20. The computing device of claim 14 , wherein the pre-decode data encodes a pair scheduling hint that configures a scheduling unit to select to issue the first instruction and a second instruction in a single scheduling cycle, and wherein the first instruction and the second instruction are associated with a first thread group of the two or more thread groups.
Instruction prefetching · CPC title
Pipelined decoding, e.g. using predecoding · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.