Fifo queue, memory resource, and task management for graphics processing
US-2019236749-A1 · Aug 1, 2019 · US
US10796472B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10796472-B2 |
| Application number | US-201816024821-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2018 |
| Priority date | Jun 30, 2018 |
| Publication date | Oct 6, 2020 |
| Grant date | Oct 6, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatus and method for simultaneous command streamers. For example, one embodiment of an apparatus comprises: a plurality of work element queues to store work elements for a plurality of thread contexts, each work element associated with a context descriptor identifying a context storage region in memory; a plurality of command streamers, each command streamer associated with one of the plurality of work element queues, the command streamers to independently submit instructions for execution as specified by the work elements; a thread dispatcher to evaluate the thread contexts including priority values, to tag each instruction with an execution identifier (ID), and to responsively dispatch each instruction including the execution ID in accordance with the thread context; and a plurality of graphics functional units to independently execute each instruction dispatched by the thread dispatcher and to associate each instruction with a thread context based on its execution ID.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a plurality of work element queues to store work elements for a plurality of thread contexts, each work element associated with a context descriptor identifying a context storage region in memory; a plurality of command streamers, each command streamer associated with one of the plurality of work element queues, the command streamers to independently submit instructions for execution as specified by the work elements, wherein command streamers use different types of address space identifiers for simultaneously executing the plurality of thread contexts; a thread dispatcher to evaluate the thread contexts including priority values, to tag each instruction with an execution identifier (ID), and to responsively dispatch each instruction including the execution ID in accordance with the thread context; and a plurality of graphics functional units to independently execute each instruction dispatched by the thread dispatcher and to associate each instruction with a thread context based on its execution ID, wherein the execution IDs of instructions are propagated downstream from the thread dispatcher to individual graphics function units within the plurality of graphics functional units. 2. The apparatus of claim 1 , wherein the plurality of command streamers comprises: a first set of one or more command streamers to process three-dimensional (3D) graphics processing workloads; and a second set of one or more command streamers to process compute workloads. 3. The apparatus of claim 2 wherein the first set includes command streamers which also process compute workloads in addition to the 3D graphics processing workloads. 4. The apparatus of claim 1 wherein each command streamer is associated with a different application having a different thread context. 5. The apparatus of claim 1 wherein each context descriptor comprises a logical render context address (LRCA) comprising a starting address for an associated storage region in memory. 6. The apparatus of claim 5 wherein the associated storage region comprises a hardware status subregion, a ring context subregion, and an engine context subregion. 7. The apparatus of claim 1 wherein the thread dispatcher comprises prioritization circuitry/logic to determine priority values associated with each thread and responsively dispatch instructions in accordance with relative priority values. 8. The apparatus of claim 7 wherein the thread dispatcher dispatches the instructions based on both relative priority values and instruction execution counter values associated with each thread. 9. A method comprising: queuing a plurality of work elements for a plurality of thread contexts in a plurality of work queues, each work element associated with a context descriptor identifying a context storage region in memory; independently reading the work elements from the work queues by a plurality of command streamers, each command streamer having a work queue associated therewith, wherein command streamers use different types of address space identifiers for simultaneously executing the plurality of thread contexts; submitting instructions from the command streamers for execution as specified by the work elements; evaluating the thread contexts including priority values associated with the submitted instructions; dispatching instructions indicated by the work elements from a thread dispatcher to a plurality of graphics functional units in accordance with the evaluation, tagging each instruction with a corresponding execution identifier (ID); and independently executing each instruction, associating the instruction with its thread context based on the execution ID, wherein the execution IDs of the instructions are propagated downstream from the thread dispatcher to individual graphics function units within the plurality of graphics functional units. 10. The method of claim 9 further comprising: processing three-dimensional (3D) graphics processing workloads on a first set of one or more command streamers; and processing compute workloads on a second set of one or more command streamers. 11. The method of claim 10 wherein the first set includes command streamers which also process compute workloads in addition to the 3D graphics processing workloads. 12. The method of claim 9 wherein each command streamer is associated with a different application having a different thread context. 13. The method of claim 9 wherein each context descriptor comprises a logical render context address (LRCA) comprising a starting address for an associated storage region in memory. 14. The method of claim 13 wherein the associated storage region comprises a hardware status subregion, a ring context subregion, and an engine context subregion. 15. The method of claim 9 , wherein the method further comprises determining priority values associated with each thread and responsively dispatching instructions in accordance with relative priority values. 16. The method of claim 15 , wherein the method further comprises dispatching the instructions based on both relative priority values and instruction execution counter values associated with each thread. 17. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: queuing a plurality of work elements for a plurality of thread contexts in a plurality of work queues, each work element associated with a context descriptor identifying a context storage region in memory; independently reading the work elements from the work queues by a plurality of command streamers, each command streamer having a work queue associated therewith, wherein command streamers use different types of address space identifiers for simultaneously executing the plurality of thread contexts; submitting instructions from the command streamers for execution as specified by the work elements; evaluating the thread contexts including priority values associated with the submitted instructions; dispatching instructions indicated by the work elements from a thread dispatcher to a plurality of graphics functional units in accordance with the evaluation, tagging each instruction with a corresponding execution identifier (ID); and independently executing each instruction, associating the instruction with its thread context based on the execution ID, wherein the execution IDs of the instructions are propagated downstream from the thread dispatcher to individual graphics function units within the plurality of graphics functional units. 18. The non-transitory machine-readable medium of claim 17 further comprising program code to cause the machine to perform the operations of: processing three-dimensional (3D) graphics processing workloads on a first set of one or more command streamers; and processing compute workloads on a second set of one or more command streamers. 19. The non-transitory machine-readable medium of claim 18 wherein the first set includes command streamers which also process compute workloads in addition to the 3D graphics processing workloads. 20. The non-transitory machine-readable medium of claim 17 wherein each command streamer is associated with a different application having a different thread context. 21. The non-transitory machine-readable medium of claim 17 wherein each context descriptor comprises a logical render context address (LRCA) comprising a starting address for an associated storage re
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
by program, e.g. task dispatcher, supervisor, operating system · CPC title
with variable priority · CPC title
Parallel processing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.