Simplified Hash Table
US-2024422006-A1 · Dec 19, 2024 · US
US9921873B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9921873-B2 |
| Application number | US-201213363350-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 31, 2012 |
| Priority date | Jan 31, 2012 |
| Publication date | Mar 20, 2018 |
| Grant date | Mar 20, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAs that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.
Opening claim text (preview).
The invention claimed is: 1. A method of controlling the distribution of work for task processing, the method comprising: determining a number of entries stored in a first queue, wherein the first queue is stored within a task metadata structure; reading work distribution parameters encapsulated in the task metadata structure that define a first processing task, wherein a first parameter included in the work distribution parameters specifies a number of entries (N) needed to launch a set of compute thread arrays (CTAs) for execution by a streaming multiprocessor, and a second parameter included in the work distribution parameters specifies a plurality of CTAs (M) to launch for each number of entries (N); determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter included in the work distribution parameters; in response to determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter, determining a number of CTAs to launch, wherein M CTAs are launched for each number of entries (N) stored in the first queue, launching the number of CTAs for execution by the streaming multiprocessor; and in response to executing the number of CTAs, updating a pointer to a next entry in the first queue to be processed by a next set of CTAs based on the work distribution parameters; determining an updated number of entries stored in the first queue; and based on determining the updated number of entries, adjusting the number of CTAs to launch based on the work distribution parameters and the updated number of entries. 2. The method of claim 1 , further comprising: determining that a second number of entries stored in the first queue is less than the number of entries (N) specified by the first parameter; determining that a coalesce wait time has elapsed; and launching the next set of CTAs for execution by the streaming multiprocessor to process the second number of entries. 3. The method of claim 2 , wherein a third parameter of the work distribution parameters specifies a step size that is used to update the pointer by adding the step size to the pointer and the third parameter is modified based on a partial step size operating mode when the next set of CTAs is launched. 4. The method of claim 2 , further comprising storing the number of entries stored in the first queue in a special register when the next set of CTAs is launched. 5. The method of claim 1 , further comprising: determining that a second number of entries stored in the first queue is less than the number of entries (N) specified by the first parameter; determining that a coalesce wait time has not elapsed; and waiting for additional entries in the first queue to be written with an additional portion of the work. 6. The method of claim 1 , wherein the determining is also based on an alignment parameter that specifies a first amount of work previously launched in a first set of CTAs, and the alignment parameter is less than the first parameter. 7. The method of claim 1 , wherein a third parameter of the work distribution parameters specifies a step size that is used to update the pointer by adding the step size to the pointer. 8. The method of claim 1 , further comprising writing, by the number of CTAs, data to be processed by a second compute task to entries in a second queue associated with a second processing task. 9. The method of claim 1 , wherein the entries stored in the first queue are produced during execution of a second compute task. 10. The method of claim 1 , further comprising: reading the entries stored in the first queue by the number of CTAs; and updating a second pointer to the first entry in the first queue that is processed by the next set of CTAs. 11. The method of claim 1 , further comprising determining that the number of entries causes the pointer to the first entry in the first queue to be aligned with a memory access boundary. 12. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to control the distribution of work for task processing, by performing the steps of: determining a number of entries stored in a first queue, wherein the first queue is stored within a task metadata structure; reading work distribution parameters encapsulated in the task metadata structure that define a first processing task, wherein a first parameter included in the work distribution parameters specifies a number of entries (N) needed to launch a set of compute thread arrays (CTAs) for execution by a streaming multiprocessor, and a second parameter included in the work distribution parameters specifies a plurality of CTAs (M) to launch for each number of entries (N); determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter included in the work distribution parameters; in response to determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter, determining a number of CTAs to launch, wherein M CTAs are launched for each number of entries (N) stored in the first queue, launching the number of CTAs for execution by the streaming multiprocessor; and in response to executing the number of CTAs, updating a pointer to a next entry in the first queue to be processed by a next set of CTAs based on the work distribution parameters; determining an updated number of entries stored in the first queue; and based on determining the updated number of entries, adjusting the number of CTAs to launch based on the work distribution parameters and the updated number of entries. 13. A system for controlling the distribution of work for task processing, the system comprising: a memory that is configured to store a task metadata structure that defines a first processing task and includes a first queue; and a task/work unit that is configured to: determine a number of entries stored in the first queue; read work distribution parameters encapsulated in the task metadata structure that define a first processing task, wherein a first parameter included in the work distribution parameters specifies a number of entries (N) needed to launch a set of compute thread arrays (CTAs) for execution by a streaming multiprocessor, and a second parameter included in the work distribution parameters specifies a plurality of CTAs (M) to launch for each number of entries (N); determine that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter included in the work distribution parameters; in response to determining that the number of entries stored in the first queue is equal to or greater than the number of entries (N) specified by the first parameter, determine a number of CTAs to launch, wherein M CTAs are launched for each number of entries (N) stored in the first queue, launch the number of CTAs for execution by the streaming multiprocessor; and in response to executing the number of CTAs, update a pointer to a next entry in the first queue to be processed by a next set of CTAs based on the work distribution parameters; determine an updated number of entries stored in the first queue; and based on determining the updated number of entries, adjust the number of CTAs to launch based on the work distribution parameters and the updated number of entries. 14. The system of cla
Task transfer initiation or dispatching · CPC title
the resources being hardware resources other than CPUs, Servers and Terminals · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.