Managing processing system efficiency

US11704158B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11704158-B2
Application numberUS-202117162682-A
CountryUS
Kind codeB2
Filing dateJan 29, 2021
Priority dateNov 21, 2017
Publication dateJul 18, 2023
Grant dateJul 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented using a system comprising a hardware accelerator coupled to a plurality of processors, the method comprising: identifying a splitting of the plurality of processors among: a first domain that performs tasks that support the hardware accelerator in accelerating execution of a machine-learning (ML) workload; and a second, different domain; during runtime of the system, obtaining memory usage measurements that characterize usage of system memory by the first domain and the second domain; and adjusting, based on the memory usage measurements, a configuration of (i) the first domain, (ii) the second domain, or (iii) both; and adjusting utilization of the system memory by the plurality of processors in response to adjusting the configuration. 2. The method of claim 1 , wherein the plurality of processors are included among resources of the system and the method comprises: determining resource requirements of the hardware accelerator and the ML workload; based on the determined resource requirements, assigning a variable number of processors in the first domain to perform a plurality of ML tasks that each have a first-priority level; and performing, using the processors in the first domain, the plurality of ML tasks to support the hardware accelerator in accelerating execution of the ML workload. 3. The method of claim 2 , further comprising: performing, using the second, different domain, a plurality of general processing tasks that each have a second-priority level; wherein the first-priority level is a high-priority level and the second-priority level is a low-priority level. 4. The method of claim 3 , further comprising: assigning, based on the memory usage measurements, a variable number of processors in the second domain to perform the plurality of general processing tasks that each have the second-priority level. 5. The method of claim 4 , wherein: the ML workload is a resource-intensive workload that uses a threshold amount of processing resources of the system to accelerate execution of the ML workload; and the variable number of processors that are assigned to the first domain satisfies the threshold amount of processing resources for accelerating execution of the ML workload. 6. The method of claim 3 , wherein: the hardware accelerator is included among a plurality of hardware accelerators that form an accelerator package that is coupled to the plurality of processors; and performing the plurality of ML tasks that each have the first-priority level comprises: obtaining, by a processor in the first domain, a portion of shared gradients from the accelerator package. 7. The method of claim 6 , wherein the plurality of processors act as a parameter server and performing the ML tasks that each have the first-priority level comprises: aggregating, by the plurality of processors, computed gradients that are collected from the accelerator package; updating, by the plurality of processors, a set of parameter values in real-time using the computed gradients; and providing, by the plurality of processors, the updated set of parameter values to the accelerator package. 8. The method of claim 7 , wherein: the hardware accelerator is configured to implement a neural network comprising a plurality of layers; and the set of parameter values that are updated in real-time using the computed gradients are for one or more layers of the neural network. 9. The method of claim 8 , wherein executing the ML workload comprises: repeatedly computing, using the hardware accelerator and the plurality of processors, gradients of an objective function that is used to train the neural network. 10. The method of claim 8 , wherein executing the ML workload comprises: generating a respective output of one or more layers of the neural network; and computing an inference based on the respective outputs of the one or more layers. 11. A system comprising a hardware accelerator, a plurality of processors, and a non-transitory machine-readable storage device storing instructions that are executable by a processing device of the system to cause performance of operations comprising: identifying a splitting of the plurality of processors among: a first domain that performs tasks that support the hardware accelerator in accelerating execution of a machine-learning (ML) workload; and a second, different domain; during runtime of the system, obtaining memory usage measurements that characterize usage of system memory by the first domain and the second domain; and adjusting, based on the memory usage measurements, a configuration of (i) the first domain, (ii) the second domain, or (iii) both; and adjusting utilization of the system memory by the plurality of processors in response to adjusting the configuration. 12. The system of claim 11 , wherein the plurality of processors are included among resources of the system and the operations comprise: determining resource requirements of the hardware accelerator and the ML workload; based on the determined resource requirements, assigning a variable number of processors in the first domain to perform a plurality of ML tasks that each have a first-priority level; and performing, using the processors in the first domain, the plurality of ML tasks to support the hardware accelerator in accelerating execution of the ML workload. 13. The system of claim 12 , wherein the operations further comprise: performing, using the second, different domain, a plurality of general processing tasks that each have a second-priority level; wherein the first-priority level is a high-priority level and the second-priority level is a low-priority level. 14. The system of claim 13 , wherein the operations further comprise: assigning, based on the memory usage measurements, a variable number of processors in the second domain to perform the plurality of general processing tasks that each have the second-priority level. 15. The system of claim 14 , wherein: the ML workload is a resource-intensive workload that uses a threshold amount of processing resources of the system to accelerate execution of the ML workload; and the variable number of processors that are assigned to the first domain satisfies the threshold amount of processing resources for accelerating execution of the ML workload. 16. The system of claim 13 , wherein: the hardware accelerator is included among a plurality of hardware accelerators that form an accelerator package that is coupled to the plurality of processors; and performing the plurality of ML tasks that each have the first-priority level comprises: obtaining, by a processor in the first domain, a portion of shared gradients from the accelerator package. 17. The system of claim 16 , wherein the plurality of processors act as a parameter server and performing the ML tasks that each have the first-priority level comprises: aggregating, by the plurality of processors, computed gradients that are collected from the accelerator package; updating, by the plurality of processors, a set of parameter values in real-time using the computed gradients; and providing, by the plurality of processors, the updated set of parameter values to the accelerator package. 18. The system of claim 17 , wherein: the hardware accelerator is configured to implement a neural network comprising a plurality of layers; and the set of parameter values that are updated in real-time using the computed gradients are for one or more layers of the neural network.

Assignees

Inventors

Classifications

  • Priority · CPC title

  • Machine learning · CPC title

  • using electronic means · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Techniques for rebalancing the load in a distributed system · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11704158B2 cover?
Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/5061. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).