Scheduling heterogenous computation on multithreaded processors

US10061618B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10061618-B2
Application numberUS-201213368682-A
CountryUS
Kind codeB2
Filing dateFeb 8, 2012
Priority dateJun 16, 2011
Publication dateAug 28, 2018
Grant dateAug 28, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects include computation systems that can identify computation instances that are not capable of being reentrant, or are not reentrant capable on a target architecture, or are non-reentrant as a result of having a memory conflict in a particular execution situation. A system can have a plurality of computation units, each with an independently schedulable SIMD vector. Computation instances can be defined by a program module, and a data element(s) that may be stored in a local cache for a particular computation unit. Each local cache does not maintain coherency controls for such data elements. During scheduling, a scheduler can maintain a list of running (or runnable) instances, and attempt to schedule new computation instances by determining whether any new computation instance conflicts with a running instance and responsively defer scheduling. Memory conflict checks can be conditioned on a flag or other indication of the potential for non-reentrancy.

First claim

Opening claim text (preview).

We claim: 1. A system for performing computation, comprising: a plurality of clusters of computation units, each cluster of computation units comprising a plurality of ALUs and a working memory used by the ALUs during execution of tasks on the ALUs; and a distributor of computation tasks among the clusters, the distributor coupled to read tasks requiring execution, and operable to assign the tasks for execution on the plurality of clusters, the assigning comprising, for each of the clusters, determining which locations of each working memory are referenced by non-reentrant tasks currently scheduled for execution in that cluster, and dispatching non-reentrant tasks for execution by an identified cluster with a working memory that has a location referenced by the respective non-reentrant task, but which was determined as currently not being referenced by any other non-reentrant task scheduled for execution on that cluster, the identified cluster being configured to execute the dispatched non-reentrant tasks. 2. The system of claim 1 , wherein a scheduler assigns reentrant tasks for execution on a cluster based on availability of computation resources required during execution of each task. 3. The system of claim 2 , wherein a determination of availability of computation resources comprises determining whether a required portion of memory is available in a particular cluster. 4. The system of claim 1 , wherein the distributor comprises a plurality of input buffers for the clusters, the input buffers operable to store descriptions of non-reentrant tasks to be scheduled on a respective cluster awaiting completion of execution of a conflicting non-reentrant task on that cluster. 5. The system of claim 4 , wherein the distributor is implemented as logic elements associated with each respective cluster, such that each logic element is independently operable to read tasks from an associated input buffer, and to determine whether tasks read from the associated input buffer can begin execution on the cluster associated with that logic element. 6. The system of claim 1 , wherein the distributor is operable to read a task from an input buffer, and to determine whether the task is re-entrant or non-reentrant by comparing a memory location referenced by the task with a set of memory locations identified as containing data associated with non-reentrant tasks. 7. The system of claim 1 , wherein the system is operable to categorize a task as non-reentrant by comparing a memory location referenced by the task with a set of memory locations identified as containing data, stored in a shared memory, and which can be written by a set of tasks currently scheduled for execution. 8. The system for performing graphics computation of claim 1 , further comprising local schedulers for each cluster of the plurality of clusters of computation units, each local scheduler operable to fill available computation capacity of the ALUs by selecting from a list of tasks, comprising both re-entrant and non-reentrant tasks. 9. The system for performing graphics computation of claim 8 , wherein each ALU comprises a Single Instruction Multiple Data (SIMD) execution unit having a vector width, and the local scheduler is operable to switch among different streams of instructions to be scheduled for execution on the cluster, on a cycle by cycle basis. 10. The system for performing graphics computation of claim 8 , wherein the system is operable to flag tasks as re-entrant or non-reentrant, and the local scheduler for each cluster is operable to condition the detecting of conflicting tasks on a flag associated with a received task, so that only non-reentrant tasks are checked for conflict by the local scheduler. 11. A machine-implemented method of computation task scheduling, comprising: receiving specifications for computation tasks to be executed in a cluster of computation units; maintaining a list of tasks that have been scheduled to execute in the cluster, the list comprising information indicating whether any of the tasks on the list are non-reentrant; and scheduling tasks to be executed from among the tasks specified by the received specifications, the scheduling comprising deferring scheduling of any task, among the tasks specified by the received specifications, that is non-reentrant and has a capability to write to a memory location shared by any non-reentrant task on the list of tasks; and executing the scheduled tasks. 12. The machine-implemented method of claim 11 , further comprising determining non-reentrancy of each task by inspecting a flag associated with a definition of the task. 13. The machine-implemented method of claim 11 , further comprising determining non-reentrancy of each task by profiling respective program code of which the task is an instance. 14. The machine-implemented method of claim 11 , further comprising determining non-reentrancy by detecting an overlap between a memory range indicated as being non-reentrant and a respective memory address accessed by each task. 15. The machine-implemented method of claim 11 , further comprising determining non-reentrancy by detecting an overlap between a memory address capable of being written by a task on the list of tasks, and a respective memory address accessed by task to be scheduled for execution. 16. A computation system, comprising: a plurality of clusters, each comprising a plurality of ALUs and a memory used by the ALUs as working memory during execution of tasks on the ALUs; a global scheduler operable to enqueue packets indicating processing to be conducted on the clusters, each packet identifying a program module and a group of data elements to be distributed among the clusters for use during execution of the program module; and respective schedulers for the clusters, each operable to receive packets from the global scheduler, to maintain a set of threads for which resources of the cluster have been allocated, and determine when a non-reentrant program module from a received packet will be added to the set of threads by determining that one or more of the data elements provided from the global scheduler for that packet are not being accessed by any thread of the set of threads; each of the respective schedulers being configured to run the ALUs with instructions from a selected thread, using plural data elements received from one or more packets. 17. The computation system of claim 16 , wherein the ALUs of each cluster function as a Single Instruction Multiple Data (SIMD) execution unit comprising an execution vector which can be scheduled on a cycle by cycle basis, and the scheduler for that cluster is operable to fill the execution vector with computation tasks to be executed that collectively do not reference conflicting memory address ranges. 18. An apparatus for computation, comprising: a computation unit configured for supporting of interleaved execution of instructions sourced from a plurality of program sources, each with a separate thread of control; an input buffer configured for receiving program instances of computation to be executed on the computation unit; and a scheduler configured: to maintain a list of in-progress program instances for which execution is in progress on the computation unit, and in response to completion of execution of an in-progress program instance, to attempt to schedule execution of a replacement program instance on the computation unit selected from among the received program instances, the selecting comprising determining whether any received program instance that is n

Assignees

Inventors

Classifications

  • organised in groups of units sharing resources, e.g. clusters · CPC title

  • G06F9/5038Primary

    considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • with global bypass, e.g. between pipelines, between clusters · CPC title

  • Constraint · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10061618B2 cover?
Aspects include computation systems that can identify computation instances that are not capable of being reentrant, or are not reentrant capable on a target architecture, or are non-reentrant as a result of having a memory conflict in a particular execution situation. A system can have a plurality of computation units, each with an independently schedulable SIMD vector. Computation instances c…
Who is the assignee on this patent?
Peterson Luke Tilman, Mccombe James Alexander, Imagination Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/5038. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 28 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).