Who is the assignee on this patent?

Shah Lacky V, Abdalla Karim M, Treichler Sean J, and 2 more

What technology area does this patent fall under?

Primary CPC classification G06F9/4881. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Controlling work distribution for processing tasks

US9921873B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9921873-B2
Application number	US-201213363350-A
Country	US
Kind code	B2
Filing date	Jan 31, 2012
Priority date	Jan 31, 2012
Publication date	Mar 20, 2018
Grant date	Mar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAs that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of controlling the distribution of work for task processing, the method comprising: determining a number of entries stored in a first queue, wherein the first queue is stored within a task metadata structure; reading work distribution parameters encapsulated in the task metadata structure that define a first processing task, wherein a first parameter included in the work distribution parameters specifies a number of entries (N) needed to launch a set of compute thread arrays (CTAs) for execution by a streaming multiprocessor, and a second parameter included in the work distribution parameters specifies a plurality of CTAs (M) to launch for each number of entries (N); determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter included in the work distribution parameters; in response to determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter, determining a number of CTAs to launch, wherein M CTAs are launched for each number of entries (N) stored in the first queue, launching the number of CTAs for execution by the streaming multiprocessor; and in response to executing the number of CTAs, updating a pointer to a next entry in the first queue to be processed by a next set of CTAs based on the work distribution parameters; determining an updated number of entries stored in the first queue; and based on determining the updated number of entries, adjusting the number of CTAs to launch based on the work distribution parameters and the updated number of entries. 2. The method of claim 1 , further comprising: determining that a second number of entries stored in the first queue is less than the number of entries (N) specified by the first parameter; determining that a coalesce wait time has elapsed; and launching the next set of CTAs for execution by the streaming multiprocessor to process the second number of entries. 3. The method of claim 2 , wherein a third parameter of the work distribution parameters specifies a step size that is used to update the pointer by adding the step size to the pointer and the third parameter is modified based on a partial step size operating mode when the next set of CTAs is launched. 4. The method of claim 2 , further comprising storing the number of entries stored in the first queue in a special register when the next set of CTAs is launched. 5. The method of claim 1 , further comprising: determining that a second number of entries stored in the first queue is less than the number of entries (N) specified by the first parameter; determining that a coalesce wait time has not elapsed; and waiting for additional entries in the first queue to be written with an additional portion of the work. 6. The method of claim 1 , wherein the determining is also based on an alignment parameter that specifies a first amount of work previously launched in a first set of CTAs, and the alignment parameter is less than the first parameter. 7. The method of claim 1 , wherein a third parameter of the work distribution parameters specifies a step size that is used to update the pointer by adding the step size to the pointer. 8. The method of claim 1 , further comprising writing, by the number of CTAs, data to be processed by a second compute task to entries in a second queue associated with a second processing task. 9. The method of claim 1 , wherein the entries stored in the first queue are produced during execution of a second compute task. 10. The method of claim 1 , further comprising: reading the entries stored in the first queue by the number of CTAs; and updating a second pointer to the first entry in the first queue that is processed by the next set of CTAs. 11. The method of claim 1 , further comprising determining that the number of entries causes the pointer to the first entry in the first queue to be aligned with a memory access boundary. 12. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to control the distribution of work for task processing, by performing the steps of: determining a number of entries stored in a first queue, wherein the first queue is stored within a task metadata structure; reading work distribution parameters encapsulated in the task metadata structure that define a first processing task, wherein a first parameter included in the work distribution parameters specifies a number of entries (N) needed to launch a set of compute thread arrays (CTAs) for execution by a streaming multiprocessor, and a second parameter included in the work distribution parameters specifies a plurality of CTAs (M) to launch for each number of entries (N); determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter included in the work distribution parameters; in response to determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter, determining a number of CTAs to launch, wherein M CTAs are launched for each number of entries (N) stored in the first queue, launching the number of CTAs for execution by the streaming multiprocessor; and in response to executing the number of CTAs, updating a pointer to a next entry in the first queue to be processed by a next set of CTAs based on the work distribution parameters; determining an updated number of entries stored in the first queue; and based on determining the updated number of entries, adjusting the number of CTAs to launch based on the work distribution parameters and the updated number of entries. 13. A system for controlling the distribution of work for task processing, the system comprising: a memory that is configured to store a task metadata structure that defines a first processing task and includes a first queue; and a task/work unit that is configured to: determine a number of entries stored in the first queue; read work distribution parameters encapsulated in the task metadata structure that define a first processing task, wherein a first parameter included in the work distribution parameters specifies a number of entries (N) needed to launch a set of compute thread arrays (CTAs) for execution by a streaming multiprocessor, and a second parameter included in the work distribution parameters specifies a plurality of CTAs (M) to launch for each number of entries (N); determine that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter included in the work distribution parameters; in response to determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter, determine a number of CTAs to launch, wherein M CTAs are launched for each number of entries (N) stored in the first queue, launch the number of CTAs for execution by the streaming multiprocessor; and in response to executing the number of CTAs, update a pointer to a next entry in the first queue to be processed by a next set of CTAs based on the work distribution parameters; determine an updated number of entries stored in the first queue; and based on determining the updated number of entries, adjust the number of CTAs to launch based on the work distribution parameters and the updated number of entries. 14. The system of cla

Assignees

Inventors

Classifications

G06F9/4806
Task transfer initiation or dispatching · CPC title
G06F9/5011
the resources being hardware resources other than CPUs, Servers and Terminals · CPC title
G06F9/4881Primary
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

View patent family 48783919

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9921873B2 cover?: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written…
Who is the assignee on this patent?: Shah Lacky V, Abdalla Karim M, Treichler Sean J, and 2 more
What technology area does this patent fall under?: Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).