Hardware assisted fine-grained data movement

US11868809B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11868809-B2
Application numberUS-202318095704-A
CountryUS
Kind codeB2
Filing dateJan 11, 2023
Priority dateMar 19, 2020
Publication dateJan 9, 2024
Grant dateJan 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented in a processor, comprising: performing a task dependency assessment of a task dependency graph representative of a plurality of tasks and task data requirements that correspond to each task of the plurality of tasks; based on the task dependency assessment, generating an asynchronous schedule for a task, the schedule providing for transfer of proxy objects required for the task to a graphics processing unit memory during execution of a preceding task; and transferring data associated with the proxy objects to the graphics processing unit memory based on the schedule. 2. The method of claim 1 , wherein: the task dependency assessment reveals a task dependency of each task in the plurality of tasks and a set of proxy objects that are required for each task to execute. 3. The method of claim 2 , wherein: the task is a successor task and the preceding task is a predecessor task. 4. The method of claim 1 , wherein: the task dependency assessment indicates a task dependency of each task in the plurality of tasks and a set of proxy objects that are required for each task to execute. 5. The method of claim 4 , further comprising: using the task dependency of each task in the plurality of tasks to schedule the plurality of tasks such that a memory transfer of proxy objects representative of data-blocks from a central processing unit (CPU) memory to a graphics processing unit (GPU) memory occurs during the execution of a predecessor task. 6. The method of claim 1 , further comprising: generating a first mapping of the plurality of tasks represented in the task dependency graph to a plurality of data blocks required for each task of the plurality of tasks. 7. The method of claim 6 , further comprising: using the first mapping to schedule a plurality of proxy objects required for each task of the plurality of tasks according to a first order of the proxy objects in the first mapping. 8. The method of claim 7 , further comprising: generating a second mapping of the plurality of tasks represented in the task dependency graph to a task dependency of each task and a total number of predecessors of each task. 9. The method of claim 8 , further comprising: using the second mapping of the plurality of tasks represented in the task dependency graph to schedule the plurality of tasks such that each successor task of the plurality of tasks is scheduled after a corresponding predecessor task. 10. The method of claim 9 , further comprising: generating a third mapping of a plurality of proxy objects to a plurality of tasks that access the plurality of data blocks. 11. The method of claim 10 , further comprising: using the third mapping of the plurality of tasks represented in the task dependency graph to schedule the plurality of tasks and the plurality of data blocks. 12. The method of claim 1 , further comprising: generating a task-dispatch list that includes a first task of the plurality of tasks to be scheduled, the plurality of tasks being placed in the task-dispatch list based on an execution order; and scheduling the plurality of tasks and a plurality of data blocks based on the execution order of the task-dispatch list. 13. A processing system including at least one processor, comprising: task scheduling circuitry; and a system direct memory access (SDMA) engine coupled to the task scheduling unit, wherein the task scheduling unit: performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of a plurality of tasks of the task dependency graph; and based on the task dependency assessment, generates an asynchronous schedule for a task, the schedule providing for transfer of proxy objects required for the task to a graphics processing unit memory during execution of a preceding task; and the SDMA engine transfers data associated with the proxy objects to the graphics processing unit memory based on the schedule. 14. The processing system of claim 13 , wherein: the task dependency assessment reveals a task dependency of each task in the plurality of tasks and which plurality of data blocks of the plurality of data blocks are required for each task to execute. 15. The processing system of claim 14 , wherein: the task dependency of each task in the plurality of tasks is used to schedule the plurality of tasks such that a memory transfer of proxy objects representative of data-blocks from a central processing unit (CPU) memory to a graphics processing unit (GPU) memory occurs during the execution of a predecessor task. 16. The processing system of claim 15 , further comprising: a task scheduling unit that generates a first mapping of the plurality of tasks represented in the task dependency graph to a plurality of data blocks required for each task of the plurality of tasks. 17. The processing system of claim 16 , wherein: the first mapping is used to schedule the plurality of data blocks required for each task of the plurality of tasks according to a first order of the data blocks in the first mapping. 18. The processing system of claim 17 , wherein: the task scheduling unit generates a second mapping of the plurality of tasks represented in the task dependency graph to the task dependency of each task and a total number of predecessors of each task. 19. The processing system of claim 18 , wherein: the second mapping of the plurality of tasks represented in the task dependency graph is used to schedule the plurality of tasks such that each successor task of the plurality of tasks is scheduled after a corresponding predecessor task. 20. The processing system of claim 13 , wherein a proxy object is an object that contains information of a data block of a specific size.

Assignees

Inventors

Classifications

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Program synchronisation; Mutual exclusion, e.g. by means of semaphores · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Scheduler internals · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11868809B2 cover?
A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a sec…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).