Mixed inference using low and high precision
US-2019146800-A1 · May 16, 2019 · US
US11367160B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11367160-B2 |
| Application number | US-201816053341-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 2, 2018 |
| Priority date | Aug 2, 2018 |
| Publication date | Jun 21, 2022 |
| Grant date | Jun 21, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
Opening claim text (preview).
What is claimed is: 1. A processing system, comprising: a Single Instruction Multiple Data (SIMD) or Single Instruction Multiple Thread (SIMT) processor that executes parallel instruction streams and is configured to operate in a graphics-greedy mode or a compute-greedy mode at respective times; a hardware scheduler connected to the processor, the hardware scheduler scheduling the processor to simultaneously execute, in parallel, at least one graphics warp and at least one compute warp by selecting between (a) scheduling at least one compute warp to the processor while operating in the graphics-greedy mode repeatedly scheduling graphics warps to the processor from a graphics pipeline, and (b) scheduling at least one graphics warp to the processor while operating in the compute-greedy mode repeatedly scheduling compute warps to the processor from a compute pipeline; and a hardware arbiter configured to, in response to a detected underutilization of a resource associated with the processor during said scheduling, determining a current operating mode of the processor and signaling the hardware scheduler to perform said scheduling at least one compute warp when the current operating mode is the graphics-greedy mode or said scheduling at least one graphics warp when the current operating mode is the compute-greedy mode. 2. A parallel processing unit, comprising: a plurality of processing units, each processing unit configured to operate in a graphics-greedy mode or a compute-greedy mode at respective times, and to simultaneously run graphics work items from a graphics queue and compute work items from a compute queue; a hardware scheduler configured to continuously select graphics work items from the graphics queue for running on a particular processing unit of the plurality of processing units when the particular processing unit is configured to operate in the graphics-greedy mode, and to continuously select compute work items from the compute queue for running on the particular processing unit when the particular processing unit is configured to operate in the compute-greedy mode; and a hardware arbiter configured to, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, selectively cause the particular processing unit to run one or more compute work items from the compute queue when the particular processing unit is configured to operate in the graphics-greedy mode, and to cause the particular processing unit to run one or more graphics work items from the graphics queue when the particular processing unit is configured to operate in the compute-greedy mode. 3. The parallel processing unit according to claim 2 , wherein each of the plurality of processing units is a single instruction multiple data (SIMD) processor or a single instruction multiple thread (SIMT) processor. 4. The parallel processing unit according to claim 2 , wherein the hardware scheduler is further configured to select the graphics-greedy mode or the compute-greedy mode based at least upon software-configured priority values associated with said graphics work items and compute work items. 5. The parallel processing unit according to claim 4 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based upon a software-configured scheduling policy. 6. The parallel processing unit according to claim 5 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based further upon occupancy metrics corresponding to occupancy of processing and memory resources by graphics work items and compute work items. 7. The parallel processing unit according to claim 6 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based upon metrics in said particular processing unit. 8. The parallel processing unit according to claim 6 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based upon metrics in said particular processing unit and other processing units. 9. The parallel processing unit according to claim 6 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based further upon output stalling metrics associated with graphics work items output from said processing unit and upon input starving metrics associated with input of graphics work items and compute work items to said processing unit. 10. The parallel processing unit according to claim 9 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based further upon time-averaged values of said occupancy metrics, said output stalling metrics associated with graphics work items and said input starving metrics associated with input of graphics work items and compute work items. 11. The parallel processing unit according to claim 10 , wherein the occupancy metrics comprises one or more of occupancy metrics for register files, occupancy metrics for warp resources, occupancy metrics for shared memory, and occupancy metrics for ISBE memory, wherein said input starving metrics associated with graphics work items comprising at least one of a vertex-associated queue and a pixel-associated queue, wherein said input starving metrics associated with compute work items comprising starving metrics associated with the compute queue, and wherein said output stalling metrics associated with graphics work items comprising output stalling metrics for at least one of a vertex-associated queue and a pixel-associated queue. 12. The parallel processing unit according to claim 9 , wherein the output stalling metrics include effects of back pressure from one or more fixed-function units processing graphics work items. 13. The parallel processing unit according to claim 4 , wherein the hardware scheduler or the hardware arbiter is further configured to determine a number of work items to be selected from the graphics queue or the compute queue based upon a respective trickle parameter specified in a software-specified policy. 14. The parallel processing unit according to claim 2 , wherein the hardware scheduler is further configured to in response to determining to launch a group of graphics work items to the particular processing unit, launch one or more graphics work items already assigned to the hardware scheduler; and in response to determining to launch a group of compute work items: reserve resources associated with the particular processing unit for a particular number of compute work items; and request the particular number of compute work items from the compute queue. 15. The parallel processing unit according to claim 14 , wherein the hardware scheduler is further configured to, in response to determining to launch the group of compute work items: launch compute work items received in response to the requesting on the particular processing unit, or not launch compute work items in response to receiving a negative acknowledgment to the requesting. 16. The parallel processing unit according to claim 15 , wherein the hardware scheduler is further configured to pass priority information to the particular processing unit with the launching of the graphics work items or the launching of the compute work items. 17. The parallel processing unit according to claim 2 , where
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Divergence aspects · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.