Work graph scheduler implementation

US12578992B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12578992-B2
Application numberUS-202217936788-A
CountryUS
Kind codeB2
Filing dateSep 29, 2022
Priority dateSep 29, 2022
Publication dateMar 17, 2026
Grant dateMar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache. Subsequent to retrieving the work items, the local scheduler is configured to schedule the retrieved work items for execution by the coupled processors. Each of the plurality of local schedulers is configured to schedule work items for execution independent of scheduling performed by other local schedulers.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor comprising: a shared cache; a global scheduler comprising circuitry configured to store one or more work items in the shared cache; a plurality of local schedulers including at least a first local scheduler and a second local scheduler, each coupled to a plurality of processors; and circuitry, configured to cause transfer of a work item from the first local scheduler to the second local scheduler, the circuitry being configured to: cause circuitry of the first local scheduler to store a selected work item in the shared cache; and convey an indication to the second local scheduler that causes circuitry of the second local scheduler to retrieve the selected work item from the shared cache. 2 . The processor as recited in claim 1 , wherein each of the local schedulers is coupled to a local cache, and each local scheduler is configured to store work items to be executed in the local cache. 3 . The processor as recited in claim 2 , wherein each of the local schedulers is coupled to a dispatch controller configured to launch work items stored in the local cache for execution. 4 . The processor as recited in claim 3 , wherein the dispatch controller is configured to monitor a command queue configured to store a command from a local scheduler that indicates work items ready for execution have been stored in the local cache. 5 . The processor as recited in claim 1 , wherein each of the plurality of local schedulers is configured to schedule work items for execution independent of other local schedulers. 6 . The processor as recited in claim 1 , wherein to schedule the one or more work items for execution, the global scheduler is configured to: store the one or more work items in the shared cache; and convey an indication to a first local scheduler of the plurality of local schedulers, to cause the first local scheduler to retrieve the one or more work items from the shared cache. 7 . The processor as recited in claim 6 , wherein the indication identifies a location in the shared cache where the one or more work items are stored. 8 . The processor as recited in claim 1 , wherein the processor is configured to transfer the work item from the first local scheduler to the second local scheduler without direct communication between the first local scheduler and the second local scheduler. 9 . A method comprising: storing, by circuitry of a global scheduler of a processor, one or more work items in a shared cache; retrieving, by circuitry of a first local scheduler of a plurality of local schedulers of the processor, work items from the shared cache responsive to an indication from the global scheduler of the processor; transferring, by the processor, a selected work item from the first local scheduler to a second local scheduler of the plurality of local schedulers, wherein said transferring comprises: causing the first local scheduler to store the selected work items in the shared cache; and conveying an indication to the second local scheduler that causes the second local scheduler to retrieve the selected work items from the shared cache. 10 . The method as recited in claim 9 , further comprising storing, by the local scheduler, work items to be executed in a local cache. 11 . The method as recited in claim 10 , further comprising launching, by a dispatch controller, the work items from the local cache to the processors. 12 . The method as recited in claim 11 , further comprising monitoring, by the dispatch controller, a command queue configured to store a command from the local scheduler that indicates the work items ready for execution have been stored in the local cache. 13 . The method as recited in claim 9 , further comprising the local scheduler scheduling the one or more work items independent of one or more other local schedulers of the processor, wherein the global scheduler is a first level of a hierarchical scheduler, and the local scheduler and the one or more other local schedulers are a second level of the hierarchical scheduler. 14 . The method as recited in claim 9 , wherein to schedule the one or more work items for execution, the method comprises the global scheduler: storing the one or more work items in the shared cache; and conveying an indication to a first local scheduler of a plurality of local schedulers, to cause the first local scheduler to retrieve the one or more work items from the shared cache. 15 . The method as recited in claim 14 , wherein the indication identifies a location in the shared cache where the one or more work items are stored. 16 . The method as recited in claim 9 , wherein the processor is configured to transfer the selected work without direct communication between the first local scheduler and the second local scheduler. 17 . A computing system comprising: a processor comprising: a shared cache; a global scheduler comprising circuitry configured to store one or more work items in the shared cache; a plurality of local schedulers including at least a first local scheduler and a second local scheduler, each coupled to a plurality of processors; and circuitry configured to cause selected work items of the first local scheduler of the plurality of local schedulers to be transferred to the second local scheduler of the plurality of local schedulers; wherein the processor is configured to transfer the selected work items without direct communication between the first local scheduler and the second local scheduler. 18 . The computing system as recited in claim 17 , wherein in response to detecting an indication from the global scheduler, a local scheduler of the plurality of local schedulers comprises circuitry configured to: retrieve one or more work items from the shared cache; and schedule the one or more work items for execution. 19 . The processor as recited in claim 1 , wherein the selected work item comprises a work item generated by at least one of: a workgroup processor executing a previously scheduled work item; or an application or host processor that provides commands to the global scheduler. 20 . The method as recited in claim 9 , wherein the selected work item comprises a work item generated by at least one of: a workgroup processor executing a previously scheduled work item; or an application or host processor that provides commands to the global scheduler. 21 . The processor as recited in claim 1 , wherein the transfer of the selected work item is performed responsive to a load condition of a local scheduler, the load condition comprising at least one of: an overload condition in which the first local scheduler has an excess of work items; or an underutilization condition in which the second local scheduler has insufficient work items. 22 . The method as recited in claim 9 , further comprising transferring the selected work item responsive to a load condition of a local scheduler, the load condition comprising at least one of: an overload condition in which the first local scheduler has an excess of work items; or an underutilization condition in which the second local scheduler has insufficient work items.

Assignees

Inventors

Classifications

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available (error or fault processing without redundancy G06F11/0703; error detection or correction by redundancy in data representation G06F11/08; error detection or correction of the data by redundancy in operations G06F11/14; error detection or correction by redundancy in hardware G06F11/16) · CPC title

  • where the computing system component is a central processing unit [CPU] · CPC title

  • Buffers; Shared memory; Pipes · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12578992B2 cover?
Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further …
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).