Execution state analysis for assigning tasks to streaming multiprocessors

US9715413B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9715413-B2
Application numberUS-201213353155-A
CountryUS
Kind codeB2
Filing dateJan 18, 2012
Priority dateJan 18, 2012
Publication dateJul 25, 2017
Grant dateJul 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method for selecting a first processor included in a plurality of processors to receive work related to a compute task, the method comprising: analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, wherein a processor is identified as eligible when state data defining how the one compute task is to be processed has been received and acknowledged by the processor; receiving, from each of the one or more processors identified as eligible, an availability value calculated by the processor and indicating the capacity of the processor to receive new work; selecting the first processor to receive work related to the one compute task based on the availability values received from the one or more processors; and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task. 2. The computer-implemented method of claim 1 , wherein a processor is identified as eligible when the one compute task is associated with a number of outstanding work items that is greater than or equal to a threshold number of work items per CTA indicated by the one compute task. 3. The computer-implemented method of claim 1 , wherein a processor is identified as eligible when a timeout period has occurred, and a number of outstanding work items associated with the one compute task does not exceed a threshold number of work items per CTA indicated by the one compute task. 4. The computer-implemented method of claim 1 , wherein a processor is identified as eligible when the one compute task indicates that a throttle mode should be activated, and the plurality of processors is operating in the throttle mode, and wherein, in the throttle mode, the first processor is included in a restricted subset of the plurality of processors and each processor within the restricted subset is allowed to access a first portion of memory that is larger than a second portion of memory normally available to each processor in the plurality of processors when processing compute tasks in a non-throttle mode. 5. The computer-implemented method of claim 1 , wherein a fixed priority list is used to select the first processor when two or more eligible processors both have an availability value that is the highest availability value. 6. The computer-implemented method of claim 1 , wherein, when the availability values are not provided, a round robin mode is used to select the first processor. 7. The method of claim 1 , wherein, for each of the one or more processors identified as eligible, the availability value is transmitted by the processor in the form of a status message that is derived from state data associated with the processor. 8. The method of claim 1 , wherein the availability value is based on a number of CTAs currently being executed by the processor. 9. The method of claim 8 , wherein the availability value is further based on per-CTA resource requirements associated with a most recently assigned compute task. 10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to select a first processor included in a plurality of processors to receive work related to a compute task, by performing the steps of: analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, wherein a processor is identified as eligible when state data defining how the one compute task is to be processed has been received and acknowledged by the processor; receiving, from each of the one or more processors identified as eligible, an availability value calculated by the processor and indicating the capacity of the processor to receive new work; selecting the first processor to receive work related to the one compute task based on the availability values received from the one or more processors; and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task. 11. The non-transitory computer-readable storage medium of claim 10 , wherein a processor is identified as eligible when the one compute task is associated with a number of outstanding work items that is greater than or equal to a threshold number of work items per CTA indicated by the one compute task. 12. The non-transitory computer-readable storage medium of claim 10 , wherein a processor is identified as eligible when a timeout period has occurred, and a number of outstanding work items associated with the one compute task does not exceed a threshold number of work items per CTA indicated by the one compute task. 13. The non-transitory computer-readable storage medium of claim 10 , wherein a processor is identified as eligible when the one compute task indicates that a throttle mode should be activated, and the plurality of processors is operating in the throttle mode, and wherein, in the throttle mode, the first processor is included in a restricted subset of the plurality of processors and each processor within the restricted subset is allowed to access a first portion of memory that is larger than a second portion of memory normally available to each processor in the plurality of processors when processing compute tasks in a non-throttle mode. 14. The non-transitory computer-readable storage medium of claim 10 , wherein a fixed priority list is used to select the first processor when two or more eligible processors both have an availability value that is the highest availability value. 15. The non-transitory computer-readable storage medium of claim 10 , wherein, when the availability values are not provided, a round robin mode is used to select the first processor. 16. A system for selecting a first processor included in a plurality of processors to receive work related to a compute task, the system comprising: a memory that is configured to store the compute task; a plurality of processors; and a work distribution unit that is configured to: analyze state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, wherein a processor is identified as eligible when state data defining how the one compute task is to be processed has been received and acknowledged by the processor; receive, from each of the one or more processors identified as eligible, an availability value calculated by the processor and indicating the capacity of the processor to receive new work; select the first processor to receive work related to the one compute task based on the availability values received from the one or more processors; and issue, to the first processor via a cooperative thread array (CTA), the work related to the one compute task. 17. The system of claim 16 , wherein a processor is identified as eligible when the one compute task is associated with a number of outstanding work items that is greater than or equal to a threshold number of work items per CTA indicated by the one compute task. 18. The system of claim 16 , wherein a processor is identified as eligible when a timeout period has occurred, and a number of outstanding work items associated with the one compute task does not exceed a threshold number of work it

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9715413B2 cover?
One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to …
Who is the assignee on this patent?
Abdalla Karim M, Shah Lacky V, Duluk Jr Jerome F, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F9/505. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).