What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Building neural networks for resource allocation for iterative workloads using reinforcement learning

US11461145B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11461145-B2
Application number	US-201916259244-A
Country	US
Kind code	B2
Filing date	Jan 28, 2019
Priority date	Jan 28, 2019
Publication date	Oct 4, 2022
Grant date	Oct 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Reinforcement learning agents for resource allocation for iterative workloads, such as training Deep Neural Networks, are configured. One method comprises obtaining a specification of an iterative workload comprising multiple states and a set of available actions for each state, and a domain model of the iterative workload relating allocated resources with service metrics; adjusting weights of a reinforcement learning agent by performing iteration steps for each simulated iteration of the iterative workload and using variables from the simulated iteration to refine the reinforcement learning agent; and determining a dynamic resource allocation policy for the iterative workload. The exemplary iteration steps comprise: (a) selecting an action for a current state, obtaining a reward for the selected action and selecting a next state based on the current state and/or the selected action; (b) updating a function that evaluates a quality of a plurality of state-action combinations; and (c) repeating steps (a) and (b) with a new allocation of resources.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining (i) a specification of an iterative workload comprising a plurality of states of the iterative workload and a set of available actions for one or more of the plurality of states, and (ii) a domain model of the iterative workload that relates an amount of resources allocated in training data with one or more service metrics, wherein a duration of one simulated iteration of a plurality of simulated iterations of the iterative workload using said domain model of the iterative workload satisfies one or more predefined duration criteria; adjusting weights of at least one reinforcement learning agent by performing iteration steps for each simulated iteration of the iterative workload and then using variables observed during a given simulated iteration of the iterative workload to refine the at least one reinforcement learning agent; and determining, by the at least one reinforcement learning agent, a dynamic resource allocation policy for the iterative workload, wherein the iteration steps for each simulated iteration of the iterative workload comprise: (a) employing the at least one reinforcement learning agent to select an action from the set of available actions for a current state, obtain a reward for the selected action and select a next state based on one or more of the current state and the selected action for the current state; (b) updating, by the at least one reinforcement learning agent, a value of a quality function that evaluates a quality of a plurality of state-action combinations using a weighted average of: (i) the value of the quality function for the current state and the selected action for the current state and (ii) the reward for the selected action and the value of the quality function for the next state and at least one of the set of available actions for the next state, wherein the set of available actions for the next state comprises one or more of an increment and a decrement of the amount of resources allocated to the iterative workload, wherein the current state is associated with a first time and comprises at least a first service metric associated with the first time and an amount of resources allocated to the iterative workload at the first time, and wherein the next state is associated with a second time and comprises at least a second service metric associated with the second time and an amount of resources allocated to the iterative workload at the second time; and (c) repeating the employing and updating steps with a new allocation of resources for a respective simulated iteration of the iterative workload. 2. The method of claim 1 , wherein the domain model is obtained from sample training executions used to learn the relationship between the amount of resources allocated and the one or more service metrics. 3. The method of claim 1 , wherein the step of adjusting weights of the at least one reinforcement learning agent employs a reward metric based on a difference between a desired service metric and a measured service metric. 4. The method of claim 1 , wherein the step of adjusting weights of the at least one reinforcement learning agent comprises a neural network selecting an action from the set of available actions based on a current state and an expected reward of the selected action and comparing the expected reward of the selected action to the actual obtained reward. 5. The method of claim 1 , wherein the iterative workload comprises a training of a Deep Neural Network. 6. The method of claim 1 , wherein possible actions for resource allocation are discretized using a control action parameter. 7. The method of claim 1 , wherein the simulated iteration executes in a simulated environment that generates observations from the domain model. 8. The method of claim 1 , wherein the quality function is approximated using a deep Q neural network (QDNN). 9. A computer program product, comprising a non-transitory machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps: obtaining (i) a specification of an iterative workload comprising a plurality of states of the iterative workload and a set of available actions for one or more of the plurality of states, and (ii) a domain model of the iterative workload that relates an amount of resources allocated in training data with one or more service metrics, wherein a duration of one simulated iteration of a plurality of simulated iterations of the iterative workload using said domain model of the iterative workload satisfies one or more predefined duration criteria; adjusting weights of at least one reinforcement learning agent by performing iteration steps for each simulated iteration of the iterative workload and then using variables observed during a given simulated iteration of the iterative workload to refine the at least one reinforcement learning agent; and determining, by the at least one reinforcement learning agent, a dynamic resource allocation policy for the iterative workload, wherein the iteration steps for each simulated iteration of the iterative workload comprise: (a) employing the at least one reinforcement learning agent to select an action from the set of available actions for a current state, obtain a reward for the selected action and select a next state based on one or more of the current state and the selected action for the current state; (b) updating, by the at least one reinforcement learning agent, a value of a quality function that evaluates a quality of a plurality of state-action combinations using a weighted average of: (i) the value of the quality function for the current state and the selected action for the current state and (ii) the reward for the selected action and the value of the quality function for the next state and at least one of the set of available actions for the next state, wherein the set of available actions for the next state comprises one or more of an increment and a decrement of the amount of resources allocated to the iterative workload, wherein the current state is associated with a first time and comprises at least a first service metric associated with the first time and an amount of resources allocated to the iterative workload at the first time, and wherein the next state is associated with a second time and comprises at least a second service metric associated with the second time and an amount of resources allocated to the iterative workload at the second time; and (c) repeating the employing and updating steps with a new allocation of resources for a respective simulated iteration of the iterative workload. 10. The computer program product of claim 9 , wherein the domain model is obtained from sample training executions used to learn the relationship between the amount of resources allocated and the one or more service metrics. 11. The computer program product of claim 9 , wherein the step of adjusting weights of the at least one reinforcement learning agent employs a reward metric based on a difference between a desired service metric and a measured service metric. 12. The computer program product of claim 9 , wherein the step of adjusting weights of the at least one reinforcement learning agent comprises a neural network selecting an action from the set of available actions based on a current state and an expected reward of the selected action and comparing the expected reward of the selected action to the actual obtained reward. 13. The computer program product of claim 9 , wherein the iterative workload compri

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06N3/047
Probabilistic or stochastic networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/092
Reinforcement learning · CPC title
G06N3/0499
Feedforward networks · CPC title
G06Q10/0631
Resource planning, allocation, distributing or scheduling for enterprises or organisations · CPC title

Patent family

Related publications grouped by family.

View patent family 71733763

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11461145B2 cover?: Reinforcement learning agents for resource allocation for iterative workloads, such as training Deep Neural Networks, are configured. One method comprises obtaining a specification of an iterative workload comprising multiple states and a set of available actions for each state, and a domain model of the iterative workload relating allocated resources with service metrics; adjusting weights of …
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).