Reinforcement learning for light transport
US-2018018814-A1 · Jan 18, 2018 · US
US11461145B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11461145-B2 |
| Application number | US-201916259244-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 28, 2019 |
| Priority date | Jan 28, 2019 |
| Publication date | Oct 4, 2022 |
| Grant date | Oct 4, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Reinforcement learning agents for resource allocation for iterative workloads, such as training Deep Neural Networks, are configured. One method comprises obtaining a specification of an iterative workload comprising multiple states and a set of available actions for each state, and a domain model of the iterative workload relating allocated resources with service metrics; adjusting weights of a reinforcement learning agent by performing iteration steps for each simulated iteration of the iterative workload and using variables from the simulated iteration to refine the reinforcement learning agent; and determining a dynamic resource allocation policy for the iterative workload. The exemplary iteration steps comprise: (a) selecting an action for a current state, obtaining a reward for the selected action and selecting a next state based on the current state and/or the selected action; (b) updating a function that evaluates a quality of a plurality of state-action combinations; and (c) repeating steps (a) and (b) with a new allocation of resources.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: obtaining (i) a specification of an iterative workload comprising a plurality of states of the iterative workload and a set of available actions for one or more of the plurality of states, and (ii) a domain model of the iterative workload that relates an amount of resources allocated in training data with one or more service metrics, wherein a duration of one simulated iteration of a plurality of simulated iterations of the iterative workload using said domain model of the iterative workload satisfies one or more predefined duration criteria; adjusting weights of at least one reinforcement learning agent by performing iteration steps for each simulated iteration of the iterative workload and then using variables observed during a given simulated iteration of the iterative workload to refine the at least one reinforcement learning agent; and determining, by the at least one reinforcement learning agent, a dynamic resource allocation policy for the iterative workload, wherein the iteration steps for each simulated iteration of the iterative workload comprise: (a) employing the at least one reinforcement learning agent to select an action from the set of available actions for a current state, obtain a reward for the selected action and select a next state based on one or more of the current state and the selected action for the current state; (b) updating, by the at least one reinforcement learning agent, a value of a quality function that evaluates a quality of a plurality of state-action combinations using a weighted average of: (i) the value of the quality function for the current state and the selected action for the current state and (ii) the reward for the selected action and the value of the quality function for the next state and at least one of the set of available actions for the next state, wherein the set of available actions for the next state comprises one or more of an increment and a decrement of the amount of resources allocated to the iterative workload, wherein the current state is associated with a first time and comprises at least a first service metric associated with the first time and an amount of resources allocated to the iterative workload at the first time, and wherein the next state is associated with a second time and comprises at least a second service metric associated with the second time and an amount of resources allocated to the iterative workload at the second time; and (c) repeating the employing and updating steps with a new allocation of resources for a respective simulated iteration of the iterative workload. 2. The method of claim 1 , wherein the domain model is obtained from sample training executions used to learn the relationship between the amount of resources allocated and the one or more service metrics. 3. The method of claim 1 , wherein the step of adjusting weights of the at least one reinforcement learning agent employs a reward metric based on a difference between a desired service metric and a measured service metric. 4. The method of claim 1 , wherein the step of adjusting weights of the at least one reinforcement learning agent comprises a neural network selecting an action from the set of available actions based on a current state and an expected reward of the selected action and comparing the expected reward of the selected action to the actual obtained reward. 5. The method of claim 1 , wherein the iterative workload comprises a training of a Deep Neural Network. 6. The method of claim 1 , wherein possible actions for resource allocation are discretized using a control action parameter. 7. The method of claim 1 , wherein the simulated iteration executes in a simulated environment that generates observations from the domain model. 8. The method of claim 1 , wherein the quality function is approximated using a deep Q neural network (QDNN). 9. A computer program product, comprising a non-transitory machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps: obtaining (i) a specification of an iterative workload comprising a plurality of states of the iterative workload and a set of available actions for one or more of the plurality of states, and (ii) a domain model of the iterative workload that relates an amount of resources allocated in training data with one or more service metrics, wherein a duration of one simulated iteration of a plurality of simulated iterations of the iterative workload using said domain model of the iterative workload satisfies one or more predefined duration criteria; adjusting weights of at least one reinforcement learning agent by performing iteration steps for each simulated iteration of the iterative workload and then using variables observed during a given simulated iteration of the iterative workload to refine the at least one reinforcement learning agent; and determining, by the at least one reinforcement learning agent, a dynamic resource allocation policy for the iterative workload, wherein the iteration steps for each simulated iteration of the iterative workload comprise: (a) employing the at least one reinforcement learning agent to select an action from the set of available actions for a current state, obtain a reward for the selected action and select a next state based on one or more of the current state and the selected action for the current state; (b) updating, by the at least one reinforcement learning agent, a value of a quality function that evaluates a quality of a plurality of state-action combinations using a weighted average of: (i) the value of the quality function for the current state and the selected action for the current state and (ii) the reward for the selected action and the value of the quality function for the next state and at least one of the set of available actions for the next state, wherein the set of available actions for the next state comprises one or more of an increment and a decrement of the amount of resources allocated to the iterative workload, wherein the current state is associated with a first time and comprises at least a first service metric associated with the first time and an amount of resources allocated to the iterative workload at the first time, and wherein the next state is associated with a second time and comprises at least a second service metric associated with the second time and an amount of resources allocated to the iterative workload at the second time; and (c) repeating the employing and updating steps with a new allocation of resources for a respective simulated iteration of the iterative workload. 10. The computer program product of claim 9 , wherein the domain model is obtained from sample training executions used to learn the relationship between the amount of resources allocated and the one or more service metrics. 11. The computer program product of claim 9 , wherein the step of adjusting weights of the at least one reinforcement learning agent employs a reward metric based on a difference between a desired service metric and a measured service metric. 12. The computer program product of claim 9 , wherein the step of adjusting weights of the at least one reinforcement learning agent comprises a neural network selecting an action from the set of available actions based on a current state and an expected reward of the selected action and comparing the expected reward of the selected action to the actual obtained reward. 13. The computer program product of claim 9 , wherein the iterative workload compri
Probabilistic or stochastic networks · CPC title
Combinations of networks · CPC title
Reinforcement learning · CPC title
Feedforward networks · CPC title
Resource planning, allocation, distributing or scheduling for enterprises or organisations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.