Method and apparatus for simultaneously executing multiple contexts on a graphics engine

US10796472B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10796472-B2
Application numberUS-201816024821-A
CountryUS
Kind codeB2
Filing dateJun 30, 2018
Priority dateJun 30, 2018
Publication dateOct 6, 2020
Grant dateOct 6, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and method for simultaneous command streamers. For example, one embodiment of an apparatus comprises: a plurality of work element queues to store work elements for a plurality of thread contexts, each work element associated with a context descriptor identifying a context storage region in memory; a plurality of command streamers, each command streamer associated with one of the plurality of work element queues, the command streamers to independently submit instructions for execution as specified by the work elements; a thread dispatcher to evaluate the thread contexts including priority values, to tag each instruction with an execution identifier (ID), and to responsively dispatch each instruction including the execution ID in accordance with the thread context; and a plurality of graphics functional units to independently execute each instruction dispatched by the thread dispatcher and to associate each instruction with a thread context based on its execution ID.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a plurality of work element queues to store work elements for a plurality of thread contexts, each work element associated with a context descriptor identifying a context storage region in memory; a plurality of command streamers, each command streamer associated with one of the plurality of work element queues, the command streamers to independently submit instructions for execution as specified by the work elements, wherein command streamers use different types of address space identifiers for simultaneously executing the plurality of thread contexts; a thread dispatcher to evaluate the thread contexts including priority values, to tag each instruction with an execution identifier (ID), and to responsively dispatch each instruction including the execution ID in accordance with the thread context; and a plurality of graphics functional units to independently execute each instruction dispatched by the thread dispatcher and to associate each instruction with a thread context based on its execution ID, wherein the execution IDs of instructions are propagated downstream from the thread dispatcher to individual graphics function units within the plurality of graphics functional units. 2. The apparatus of claim 1 , wherein the plurality of command streamers comprises: a first set of one or more command streamers to process three-dimensional (3D) graphics processing workloads; and a second set of one or more command streamers to process compute workloads. 3. The apparatus of claim 2 wherein the first set includes command streamers which also process compute workloads in addition to the 3D graphics processing workloads. 4. The apparatus of claim 1 wherein each command streamer is associated with a different application having a different thread context. 5. The apparatus of claim 1 wherein each context descriptor comprises a logical render context address (LRCA) comprising a starting address for an associated storage region in memory. 6. The apparatus of claim 5 wherein the associated storage region comprises a hardware status subregion, a ring context subregion, and an engine context subregion. 7. The apparatus of claim 1 wherein the thread dispatcher comprises prioritization circuitry/logic to determine priority values associated with each thread and responsively dispatch instructions in accordance with relative priority values. 8. The apparatus of claim 7 wherein the thread dispatcher dispatches the instructions based on both relative priority values and instruction execution counter values associated with each thread. 9. A method comprising: queuing a plurality of work elements for a plurality of thread contexts in a plurality of work queues, each work element associated with a context descriptor identifying a context storage region in memory; independently reading the work elements from the work queues by a plurality of command streamers, each command streamer having a work queue associated therewith, wherein command streamers use different types of address space identifiers for simultaneously executing the plurality of thread contexts; submitting instructions from the command streamers for execution as specified by the work elements; evaluating the thread contexts including priority values associated with the submitted instructions; dispatching instructions indicated by the work elements from a thread dispatcher to a plurality of graphics functional units in accordance with the evaluation, tagging each instruction with a corresponding execution identifier (ID); and independently executing each instruction, associating the instruction with its thread context based on the execution ID, wherein the execution IDs of the instructions are propagated downstream from the thread dispatcher to individual graphics function units within the plurality of graphics functional units. 10. The method of claim 9 further comprising: processing three-dimensional (3D) graphics processing workloads on a first set of one or more command streamers; and processing compute workloads on a second set of one or more command streamers. 11. The method of claim 10 wherein the first set includes command streamers which also process compute workloads in addition to the 3D graphics processing workloads. 12. The method of claim 9 wherein each command streamer is associated with a different application having a different thread context. 13. The method of claim 9 wherein each context descriptor comprises a logical render context address (LRCA) comprising a starting address for an associated storage region in memory. 14. The method of claim 13 wherein the associated storage region comprises a hardware status subregion, a ring context subregion, and an engine context subregion. 15. The method of claim 9 , wherein the method further comprises determining priority values associated with each thread and responsively dispatching instructions in accordance with relative priority values. 16. The method of claim 15 , wherein the method further comprises dispatching the instructions based on both relative priority values and instruction execution counter values associated with each thread. 17. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: queuing a plurality of work elements for a plurality of thread contexts in a plurality of work queues, each work element associated with a context descriptor identifying a context storage region in memory; independently reading the work elements from the work queues by a plurality of command streamers, each command streamer having a work queue associated therewith, wherein command streamers use different types of address space identifiers for simultaneously executing the plurality of thread contexts; submitting instructions from the command streamers for execution as specified by the work elements; evaluating the thread contexts including priority values associated with the submitted instructions; dispatching instructions indicated by the work elements from a thread dispatcher to a plurality of graphics functional units in accordance with the evaluation, tagging each instruction with a corresponding execution identifier (ID); and independently executing each instruction, associating the instruction with its thread context based on the execution ID, wherein the execution IDs of the instructions are propagated downstream from the thread dispatcher to individual graphics function units within the plurality of graphics functional units. 18. The non-transitory machine-readable medium of claim 17 further comprising program code to cause the machine to perform the operations of: processing three-dimensional (3D) graphics processing workloads on a first set of one or more command streamers; and processing compute workloads on a second set of one or more command streamers. 19. The non-transitory machine-readable medium of claim 18 wherein the first set includes command streamers which also process compute workloads in addition to the 3D graphics processing workloads. 20. The non-transitory machine-readable medium of claim 17 wherein each command streamer is associated with a different application having a different thread context. 21. The non-transitory machine-readable medium of claim 17 wherein each context descriptor comprises a logical render context address (LRCA) comprising a starting address for an associated storage re

Assignees

Inventors

Classifications

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • by program, e.g. task dispatcher, supervisor, operating system · CPC title

  • G06F9/4831Primary

    with variable priority · CPC title

  • Parallel processing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10796472B2 cover?
Apparatus and method for simultaneous command streamers. For example, one embodiment of an apparatus comprises: a plurality of work element queues to store work elements for a plurality of thread contexts, each work element associated with a context descriptor identifying a context storage region in memory; a plurality of command streamers, each command streamer associated with one of the plura…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 06 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).