Task scheduler mechanism, operating system, and multiprocessor system
US-9513974-B2 · Dec 6, 2016 · US
US11704158B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11704158-B2 |
| Application number | US-202117162682-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 29, 2021 |
| Priority date | Nov 21, 2017 |
| Publication date | Jul 18, 2023 |
| Grant date | Jul 18, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.
Opening claim text (preview).
What is claimed is: 1. A method implemented using a system comprising a hardware accelerator coupled to a plurality of processors, the method comprising: identifying a splitting of the plurality of processors among: a first domain that performs tasks that support the hardware accelerator in accelerating execution of a machine-learning (ML) workload; and a second, different domain; during runtime of the system, obtaining memory usage measurements that characterize usage of system memory by the first domain and the second domain; and adjusting, based on the memory usage measurements, a configuration of (i) the first domain, (ii) the second domain, or (iii) both; and adjusting utilization of the system memory by the plurality of processors in response to adjusting the configuration. 2. The method of claim 1 , wherein the plurality of processors are included among resources of the system and the method comprises: determining resource requirements of the hardware accelerator and the ML workload; based on the determined resource requirements, assigning a variable number of processors in the first domain to perform a plurality of ML tasks that each have a first-priority level; and performing, using the processors in the first domain, the plurality of ML tasks to support the hardware accelerator in accelerating execution of the ML workload. 3. The method of claim 2 , further comprising: performing, using the second, different domain, a plurality of general processing tasks that each have a second-priority level; wherein the first-priority level is a high-priority level and the second-priority level is a low-priority level. 4. The method of claim 3 , further comprising: assigning, based on the memory usage measurements, a variable number of processors in the second domain to perform the plurality of general processing tasks that each have the second-priority level. 5. The method of claim 4 , wherein: the ML workload is a resource-intensive workload that uses a threshold amount of processing resources of the system to accelerate execution of the ML workload; and the variable number of processors that are assigned to the first domain satisfies the threshold amount of processing resources for accelerating execution of the ML workload. 6. The method of claim 3 , wherein: the hardware accelerator is included among a plurality of hardware accelerators that form an accelerator package that is coupled to the plurality of processors; and performing the plurality of ML tasks that each have the first-priority level comprises: obtaining, by a processor in the first domain, a portion of shared gradients from the accelerator package. 7. The method of claim 6 , wherein the plurality of processors act as a parameter server and performing the ML tasks that each have the first-priority level comprises: aggregating, by the plurality of processors, computed gradients that are collected from the accelerator package; updating, by the plurality of processors, a set of parameter values in real-time using the computed gradients; and providing, by the plurality of processors, the updated set of parameter values to the accelerator package. 8. The method of claim 7 , wherein: the hardware accelerator is configured to implement a neural network comprising a plurality of layers; and the set of parameter values that are updated in real-time using the computed gradients are for one or more layers of the neural network. 9. The method of claim 8 , wherein executing the ML workload comprises: repeatedly computing, using the hardware accelerator and the plurality of processors, gradients of an objective function that is used to train the neural network. 10. The method of claim 8 , wherein executing the ML workload comprises: generating a respective output of one or more layers of the neural network; and computing an inference based on the respective outputs of the one or more layers. 11. A system comprising a hardware accelerator, a plurality of processors, and a non-transitory machine-readable storage device storing instructions that are executable by a processing device of the system to cause performance of operations comprising: identifying a splitting of the plurality of processors among: a first domain that performs tasks that support the hardware accelerator in accelerating execution of a machine-learning (ML) workload; and a second, different domain; during runtime of the system, obtaining memory usage measurements that characterize usage of system memory by the first domain and the second domain; and adjusting, based on the memory usage measurements, a configuration of (i) the first domain, (ii) the second domain, or (iii) both; and adjusting utilization of the system memory by the plurality of processors in response to adjusting the configuration. 12. The system of claim 11 , wherein the plurality of processors are included among resources of the system and the operations comprise: determining resource requirements of the hardware accelerator and the ML workload; based on the determined resource requirements, assigning a variable number of processors in the first domain to perform a plurality of ML tasks that each have a first-priority level; and performing, using the processors in the first domain, the plurality of ML tasks to support the hardware accelerator in accelerating execution of the ML workload. 13. The system of claim 12 , wherein the operations further comprise: performing, using the second, different domain, a plurality of general processing tasks that each have a second-priority level; wherein the first-priority level is a high-priority level and the second-priority level is a low-priority level. 14. The system of claim 13 , wherein the operations further comprise: assigning, based on the memory usage measurements, a variable number of processors in the second domain to perform the plurality of general processing tasks that each have the second-priority level. 15. The system of claim 14 , wherein: the ML workload is a resource-intensive workload that uses a threshold amount of processing resources of the system to accelerate execution of the ML workload; and the variable number of processors that are assigned to the first domain satisfies the threshold amount of processing resources for accelerating execution of the ML workload. 16. The system of claim 13 , wherein: the hardware accelerator is included among a plurality of hardware accelerators that form an accelerator package that is coupled to the plurality of processors; and performing the plurality of ML tasks that each have the first-priority level comprises: obtaining, by a processor in the first domain, a portion of shared gradients from the accelerator package. 17. The system of claim 16 , wherein the plurality of processors act as a parameter server and performing the ML tasks that each have the first-priority level comprises: aggregating, by the plurality of processors, computed gradients that are collected from the accelerator package; updating, by the plurality of processors, a set of parameter values in real-time using the computed gradients; and providing, by the plurality of processors, the updated set of parameter values to the accelerator package. 18. The system of claim 17 , wherein: the hardware accelerator is configured to implement a neural network comprising a plurality of layers; and the set of parameter values that are updated in real-time using the computed gradients are for one or more layers of the neural network.
Priority · CPC title
Machine learning · CPC title
using electronic means · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Techniques for rebalancing the load in a distributed system · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.