What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Simultaneous compute and graphics scheduling

US11367160B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11367160-B2
Application number	US-201816053341-A
Country	US
Kind code	B2
Filing date	Aug 2, 2018
Priority date	Aug 2, 2018
Publication date	Jun 21, 2022
Grant date	Jun 21, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.

First claim

Opening claim text (preview).

What is claimed is: 1. A processing system, comprising: a Single Instruction Multiple Data (SIMD) or Single Instruction Multiple Thread (SIMT) processor that executes parallel instruction streams and is configured to operate in a graphics-greedy mode or a compute-greedy mode at respective times; a hardware scheduler connected to the processor, the hardware scheduler scheduling the processor to simultaneously execute, in parallel, at least one graphics warp and at least one compute warp by selecting between (a) scheduling at least one compute warp to the processor while operating in the graphics-greedy mode repeatedly scheduling graphics warps to the processor from a graphics pipeline, and (b) scheduling at least one graphics warp to the processor while operating in the compute-greedy mode repeatedly scheduling compute warps to the processor from a compute pipeline; and a hardware arbiter configured to, in response to a detected underutilization of a resource associated with the processor during said scheduling, determining a current operating mode of the processor and signaling the hardware scheduler to perform said scheduling at least one compute warp when the current operating mode is the graphics-greedy mode or said scheduling at least one graphics warp when the current operating mode is the compute-greedy mode. 2. A parallel processing unit, comprising: a plurality of processing units, each processing unit configured to operate in a graphics-greedy mode or a compute-greedy mode at respective times, and to simultaneously run graphics work items from a graphics queue and compute work items from a compute queue; a hardware scheduler configured to continuously select graphics work items from the graphics queue for running on a particular processing unit of the plurality of processing units when the particular processing unit is configured to operate in the graphics-greedy mode, and to continuously select compute work items from the compute queue for running on the particular processing unit when the particular processing unit is configured to operate in the compute-greedy mode; and a hardware arbiter configured to, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, selectively cause the particular processing unit to run one or more compute work items from the compute queue when the particular processing unit is configured to operate in the graphics-greedy mode, and to cause the particular processing unit to run one or more graphics work items from the graphics queue when the particular processing unit is configured to operate in the compute-greedy mode. 3. The parallel processing unit according to claim 2 , wherein each of the plurality of processing units is a single instruction multiple data (SIMD) processor or a single instruction multiple thread (SIMT) processor. 4. The parallel processing unit according to claim 2 , wherein the hardware scheduler is further configured to select the graphics-greedy mode or the compute-greedy mode based at least upon software-configured priority values associated with said graphics work items and compute work items. 5. The parallel processing unit according to claim 4 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based upon a software-configured scheduling policy. 6. The parallel processing unit according to claim 5 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based further upon occupancy metrics corresponding to occupancy of processing and memory resources by graphics work items and compute work items. 7. The parallel processing unit according to claim 6 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based upon metrics in said particular processing unit. 8. The parallel processing unit according to claim 6 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based upon metrics in said particular processing unit and other processing units. 9. The parallel processing unit according to claim 6 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based further upon output stalling metrics associated with graphics work items output from said processing unit and upon input starving metrics associated with input of graphics work items and compute work items to said processing unit. 10. The parallel processing unit according to claim 9 , wherein the hardware arbiter is further configured to select causing running of either compute work items or graphics work items based further upon time-averaged values of said occupancy metrics, said output stalling metrics associated with graphics work items and said input starving metrics associated with input of graphics work items and compute work items. 11. The parallel processing unit according to claim 10 , wherein the occupancy metrics comprises one or more of occupancy metrics for register files, occupancy metrics for warp resources, occupancy metrics for shared memory, and occupancy metrics for ISBE memory, wherein said input starving metrics associated with graphics work items comprising at least one of a vertex-associated queue and a pixel-associated queue, wherein said input starving metrics associated with compute work items comprising starving metrics associated with the compute queue, and wherein said output stalling metrics associated with graphics work items comprising output stalling metrics for at least one of a vertex-associated queue and a pixel-associated queue. 12. The parallel processing unit according to claim 9 , wherein the output stalling metrics include effects of back pressure from one or more fixed-function units processing graphics work items. 13. The parallel processing unit according to claim 4 , wherein the hardware scheduler or the hardware arbiter is further configured to determine a number of work items to be selected from the graphics queue or the compute queue based upon a respective trickle parameter specified in a software-specified policy. 14. The parallel processing unit according to claim 2 , wherein the hardware scheduler is further configured to in response to determining to launch a group of graphics work items to the particular processing unit, launch one or more graphics work items already assigned to the hardware scheduler; and in response to determining to launch a group of compute work items: reserve resources associated with the particular processing unit for a particular number of compute work items; and request the particular number of compute work items from the compute queue. 15. The parallel processing unit according to claim 14 , wherein the hardware scheduler is further configured to, in response to determining to launch the group of compute work items: launch compute work items received in response to the requesting on the particular processing unit, or not launch compute work items in response to receiving a negative acknowledgment to the requesting. 16. The parallel processing unit according to claim 15 , wherein the hardware scheduler is further configured to pass priority information to the particular processing unit with the launching of the graphics work items or the launching of the compute work items. 17. The parallel processing unit according to claim 2 , where

Assignees

Nvidia Corp

Inventors

Classifications

G06F9/3887
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
G06F9/3888
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
G06F9/38885
Divergence aspects · CPC title
G06F9/3851
from multiple instruction streams, e.g. multistreaming · CPC title
G06F9/4881
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

View patent family 69168298

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11367160B2 cover?: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Mixed inference using low and high precision

Graphics processor with tiled compute kernels

Thread block managing method, warp managing method and non-transitory computer readable recording medium can perform the methods

Job scheduling and monitoring

Issue control for multithreaded processing

Memory reference metadata for compiler optimization

Frequently asked questions