Infrastructure driven auto-scaling of workloads
US-2024419470-A1 · Dec 19, 2024 · US
US9715413B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9715413-B2 |
| Application number | US-201213353155-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 18, 2012 |
| Priority date | Jan 18, 2012 |
| Publication date | Jul 25, 2017 |
| Grant date | Jul 25, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.
Opening claim text (preview).
We claim: 1. A computer-implemented method for selecting a first processor included in a plurality of processors to receive work related to a compute task, the method comprising: analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, wherein a processor is identified as eligible when state data defining how the one compute task is to be processed has been received and acknowledged by the processor; receiving, from each of the one or more processors identified as eligible, an availability value calculated by the processor and indicating the capacity of the processor to receive new work; selecting the first processor to receive work related to the one compute task based on the availability values received from the one or more processors; and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task. 2. The computer-implemented method of claim 1 , wherein a processor is identified as eligible when the one compute task is associated with a number of outstanding work items that is greater than or equal to a threshold number of work items per CTA indicated by the one compute task. 3. The computer-implemented method of claim 1 , wherein a processor is identified as eligible when a timeout period has occurred, and a number of outstanding work items associated with the one compute task does not exceed a threshold number of work items per CTA indicated by the one compute task. 4. The computer-implemented method of claim 1 , wherein a processor is identified as eligible when the one compute task indicates that a throttle mode should be activated, and the plurality of processors is operating in the throttle mode, and wherein, in the throttle mode, the first processor is included in a restricted subset of the plurality of processors and each processor within the restricted subset is allowed to access a first portion of memory that is larger than a second portion of memory normally available to each processor in the plurality of processors when processing compute tasks in a non-throttle mode. 5. The computer-implemented method of claim 1 , wherein a fixed priority list is used to select the first processor when two or more eligible processors both have an availability value that is the highest availability value. 6. The computer-implemented method of claim 1 , wherein, when the availability values are not provided, a round robin mode is used to select the first processor. 7. The method of claim 1 , wherein, for each of the one or more processors identified as eligible, the availability value is transmitted by the processor in the form of a status message that is derived from state data associated with the processor. 8. The method of claim 1 , wherein the availability value is based on a number of CTAs currently being executed by the processor. 9. The method of claim 8 , wherein the availability value is further based on per-CTA resource requirements associated with a most recently assigned compute task. 10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to select a first processor included in a plurality of processors to receive work related to a compute task, by performing the steps of: analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, wherein a processor is identified as eligible when state data defining how the one compute task is to be processed has been received and acknowledged by the processor; receiving, from each of the one or more processors identified as eligible, an availability value calculated by the processor and indicating the capacity of the processor to receive new work; selecting the first processor to receive work related to the one compute task based on the availability values received from the one or more processors; and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task. 11. The non-transitory computer-readable storage medium of claim 10 , wherein a processor is identified as eligible when the one compute task is associated with a number of outstanding work items that is greater than or equal to a threshold number of work items per CTA indicated by the one compute task. 12. The non-transitory computer-readable storage medium of claim 10 , wherein a processor is identified as eligible when a timeout period has occurred, and a number of outstanding work items associated with the one compute task does not exceed a threshold number of work items per CTA indicated by the one compute task. 13. The non-transitory computer-readable storage medium of claim 10 , wherein a processor is identified as eligible when the one compute task indicates that a throttle mode should be activated, and the plurality of processors is operating in the throttle mode, and wherein, in the throttle mode, the first processor is included in a restricted subset of the plurality of processors and each processor within the restricted subset is allowed to access a first portion of memory that is larger than a second portion of memory normally available to each processor in the plurality of processors when processing compute tasks in a non-throttle mode. 14. The non-transitory computer-readable storage medium of claim 10 , wherein a fixed priority list is used to select the first processor when two or more eligible processors both have an availability value that is the highest availability value. 15. The non-transitory computer-readable storage medium of claim 10 , wherein, when the availability values are not provided, a round robin mode is used to select the first processor. 16. A system for selecting a first processor included in a plurality of processors to receive work related to a compute task, the system comprising: a memory that is configured to store the compute task; a plurality of processors; and a work distribution unit that is configured to: analyze state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, wherein a processor is identified as eligible when state data defining how the one compute task is to be processed has been received and acknowledged by the processor; receive, from each of the one or more processors identified as eligible, an availability value calculated by the processor and indicating the capacity of the processor to receive new work; select the first processor to receive work related to the one compute task based on the availability values received from the one or more processors; and issue, to the first processor via a cooperative thread array (CTA), the work related to the one compute task. 17. The system of claim 16 , wherein a processor is identified as eligible when the one compute task is associated with a number of outstanding work items that is greater than or equal to a threshold number of work items per CTA indicated by the one compute task. 18. The system of claim 16 , wherein a processor is identified as eligible when a timeout period has occurred, and a number of outstanding work items associated with the one compute task does not exceed a threshold number of work it
Resource availability · CPC title
considering the load · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.