Task scheduling for machine-learning workloads
US-2021149729-A1 · May 20, 2021 · US
US11847489B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11847489-B2 |
| Application number | US-202117158943-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 26, 2021 |
| Priority date | Jan 26, 2021 |
| Publication date | Dec 19, 2023 |
| Grant date | Dec 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed relating to a shared control bus for communicating between primary control circuitry and multiple distributed graphics processor units. In some embodiments, a set of multiple processor units includes first and second graphics processors, where the first and second graphics processors are coupled to access graphics data via respective memory interfaces. A shared workload distribution bus is used to transmit control data that specifies graphics work distribution to the multiple graphics processing units. The shared workload distribution bus may be arranged in a chain topology, e.g., to connect the workload distribution circuitry to the first graphics processor and connect the first graphics processor to the second graphics processor such that the workload distribution circuitry communicates with the second graphics processor via the shared workload distribution bus connection to the first graphics processor. Disclosed techniques may facilitate graphics work distribution for a scalable number of processors.
Opening claim text (preview).
What is claimed is: 1. An apparatus, comprising: a set of multiple graphics processor units including at least first and second graphics processors, wherein the first and second graphics processors are coupled to access graphics data via respective memory interfaces; a shared workload distribution bus; and workload distribution circuitry configured to transmit, via the shared workload distribution bus, control data that specifies graphics work distribution to the multiple graphics processor units, wherein the graphics processor units are also configured to transmit control information to the workload distribution circuitry via the shared workload distribution bus; wherein one or more of the graphics processor units are configured to retrieve first graphics data via one or more of the respective memory interfaces based on the control data received via the shared workload distribution bus, wherein the control data specifies respective sets of one or more workgroups of a compute kernel, wherein the one or more workgroups include instructions that the graphics processor units are configured to execute to operate on the first graphics data; and wherein the shared workload distribution bus: connects the workload distribution circuitry to the first graphics processor and connects the first graphics processor to the second graphics processor such that the workload distribution circuitry is configured to transmit the control data to the second graphics processor via the shared workload distribution bus connection to the first graphics processor, and is configured to implement flow control using a credit management system, wherein packets that communicate credit information are distinct from packets that communicate control data. 2. The apparatus of claim 1 , wherein the first and the second graphics processors include respective arbitration circuitry configured to arbitrate between locally generated control data and control data generated by another processor unit connected to the shared workload distribution bus. 3. The apparatus of claim 2 , wherein the arbitration circuitry is configured to prioritize the control data from processor units that are further from the workload distribution circuitry on the shared workload distribution bus over the locally generated control data. 4. The apparatus of claim 1 , wherein the shared workload distribution bus provides point-to-point communications between the workload distribution circuitry and clients of the graphics processor units and is configured to aggregate multiple distinct requests from a client into a single packet of control data. 5. The apparatus of claim 4 , wherein the shared workload distribution bus is configured to arbitrate both among requests to be aggregated into a packet and among clients submitting requests to communicate with the workload distribution circuitry. 6. The apparatus of claim 1 , wherein circuitry of the shared workload distribution bus between the first and second graphics processors includes both source synchronous and synchronous communications circuitry and wherein the apparatus is configured to use one of the source synchronous and synchronous communications circuitry based on a strap signal. 7. The apparatus of claim 1 , wherein the first and the second graphics processors are located in different semiconductor substrates. 8. The apparatus of claim 1 , wherein the first and the second graphics processors are included in different power and clock domains. 9. The apparatus of claim 1 , wherein: the packets that communicate credit information include: a credit count field, a client identifier, and a graphics processor unit identifier; and nodes of the shared workload distribution bus include respective credit distribution circuitry configured to route packets that communicate credit information to clients and to other graphics processor units. 10. The apparatus of claim 1 , wherein the shared workload distribution bus supports both packets that target a single processor unit and packets that target multiple processor units. 11. The apparatus of claim 1 , wherein the multiple graphics processor units are arranged along the shared workload distribution bus according to a serial topology such that each processor unit connected to the shared workload distribution bus is directly connected to at most two other processor units via the shared workload distribution bus. 12. The apparatus of claim 1 , wherein a first portion of the shared workload distribution bus includes a first number of wires configured to transmit data in parallel and a second portion of the shared workload distribution bus includes a second number of wires configured to transmit data in parallel; wherein the apparatus further comprises downsize circuitry configured to split a packet transmitted by the first portion of the shared workload distribution bus into multiple packets for transmission by the second portion of the shared workload distribution bus. 13. The apparatus of claim 1 , where a node of the shared workload distribution bus for one of the graphics processor units includes: an input switch for control data from a corresponding graphics processor unit; an output switch for control data for the corresponding graphics processor unit; packet switches configured to receive packets from other processor units on the shared workload distribution bus; and a direction register configured to store an indication of a direction to the workload distribution circuitry via the shared workload distribution bus. 14. The apparatus of claim 1 , wherein the shared workload distribution bus provides ordering of packets between pairs of processor units connected to the shared workload distribution bus. 15. The apparatus of claim 1 , wherein the first and second graphics processors include separate respective: fragment generator circuitry; shader core circuitry; memory system circuitry that includes a data cache and a memory management unit; geometry processing circuitry; and distributed workload distribution circuitry. 16. The apparatus of claim 1 , wherein the multiple graphics processor units are arranged along the shared workload distribution bus according to at least two groups, connected by distribution center circuitry located between graphics processor units in ones of the at least two groups. 17. The apparatus of claim 16 , wherein the distribution center circuitry included in at least two different groups is configured to communicate between the two different groups via a communications fabric that is shared with non-control data. 18. A method comprising: transmitting, by workload distribution circuitry via a shared workload distribution bus to a first graphics processor unit of multiple graphics processor units, control data that specifies graphics work distribution, wherein the control data specifies respective sets of one or more workgroups of a compute kernel; transmitting, from ones of the multiple graphics processor units to the workload distribution circuitry, control information via the shared workload distribution bus; retrieving, by one or more of the graphics processor units, via respective memory interfaces based on the control data, first graphics data; executing, by one or more of the graphics processor units, instructions one or more of the workgroups to operate on the first graphics data; and providing, by the shared workload distribution bus, a credit management system, wherein packets that communicate credit information are distinct from packets that communicate control data wherei
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title
considering the load · CPC title
Buffers; Shared memory; Pipes · CPC title
where tasks reside in different layers, e.g. user- and kernel-space · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.