Simulator-training for automated reinforcement-learning-based application-managers

US11238372B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11238372-B2
Application numberUS-201916518845-A
CountryUS
Kind codeB2
Filing dateJul 22, 2019
Priority dateAug 27, 2018
Publication dateFeb 1, 2022
Grant dateFeb 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The current document is directed to methods and systems for simulation-based training of automated reinforcement-learning-based application managers. Simulators are generated from data collected from controlled computing environments controlled and may employ any of a variety of different machine-learning models to learn state-transition and reward models. The current disclosed methods and systems provide facilities for visualizing aspects of the models learned by a simulator and for initializing simulator models using domain information. In addition, the currently disclosed simulators employ weighted differences computed from simulator-generated and training-data state transitions for feedback to the machine-learning models to address various biases and deficiencies of commonly employed difference metrics in the context of training automated reinforcement-learning-based application managers.

First claim

Opening claim text (preview).

The invention claimed is: 1. A simulation manager that generates and trains simulators that are used to train automated reinforcement-learning-based application managers, the simulation manager comprising: one or more computer systems, each having one or more processors, one or more memories, one or more data-storage devices, and one or more communications subsystems; and processor instructions, stored in one or more of the one or more memories and one or more data-storage devices that, when executed by one or more of the processors, control the one or more computer systems to generate simulators that train automated reinforcement-learning-based application managers; train the generated simulators to simulate a computing environment controlled by an automated reinforcement-learning-based application manager; and provide a management interface to human domain experts for providing simulator-configuration input. 2. The simulation manager of claim 1 wherein the simulator repeatedly receives a next action a and returns, in response, a next state s′ and a reward r. 3. The simulation manager of claim 2 wherein the simulator implements a first parametrized function that receives a current state s and a next action a and returns the next state s′ and a second parametrized function that receives a state s and returns a reward r. 4. The simulation manager of claim 3 wherein the simulation manager generates a simulator by: choosing one or more machine-learning models to implement the first parametrized function and the second parameterized function; and initializes the machine-learning models. 5. The simulation manager of claim 3 wherein the simulation manager trains the generated simulators to simulate a computing environment controlled by an automated reinforcement-learning-based application manager by: receiving data collected from a computing environment controlled by an automated reinforcement-learning-based application manager, the data including action/current-state/next-state triples; and iteratively selecting a next action/current-state/next-state triple, inputting the action to the simulator, receiving an estimated next state and estimated reward from the simulator, computing a difference metric from the current state and next state, feeding the difference metric, action, and current state to the simulator, which adjusts one or more parameters of the first parametrized function to improve estimation of the next state. 6. The simulation manager of claim 5 wherein the current state and the next state are vectors containing metric and configuration elements; and wherein the distance metric is computed as the sum of terms, each term i comprising the product of a weight w i and the squared difference of the i th elements of the current state and next state. 7. The simulation manager of claim 6 wherein the weight w i , is the absolute value of the i th element of a reward-function vector, where the second parameterized function uses a dot product of the reward-function vector and a state vector to estimate the reward r corresponding to the state. 8. The simulation manager of claim 5 wherein the simulator learns the parameter values for the second parameterized function during training by optimizing the second parameterized function to produce rewards that would produce the action/current-state/next-state triples of the data collected from the computing environment controlled by the automated reinforcement-learning-based application manager. 9. The simulation manager of claim 5 wherein the data collected from the computing environment controlled by the automated reinforcement-learning-based application manager includes rewards corresponding to the action/current-state/next-state triples; and wherein the simulator adjusts one or more parameters of the second parameterized function in response to computed differences between the data rewards and corresponding estimated rewards. 10. The simulation manager of claim 4 wherein the management interface provides, for one or more of the machine-learning models used to implement the first and second parameterized functions, simulator-configuration-input features through which machine-learning model parameters can be specified. 11. The simulation manager of claim 10 wherein model parameters may include; the number of layers in a neural network or decision tree; functions or logic associated with decision-tree nodes; initial weights of neural-network nodes; initial values of weights that multiple terms of linear combinations of terms; the size and contents of state vectors; and the number and contents of classifications. 12. The simulation manager of claim 4 wherein the management interface provides a visualization feature that displays the reward surface for two selected elements of the state vector. 13. A method for training an automated reinforcement-learning-based application manager, the method comprising: generating a simulator; training the generated simulator to simulate a computing environment controlled by an automated reinforcement-learning-based application manager; and connecting the automated reinforcement-learning-based application manager to the simulator. 14. The method of claim 13 wherein the simulator repeatedly receives a next action a from the automated reinforcement-learning-based application manager and returns, in response, a next state s′ and a reward r to the automated reinforcement-learning-based application manager. 15. The method of claim 14 wherein the simulator implements a first parametrized function that receives a current state s and a next action a and returns the next state s′ and a second parametrized function that receives a states and returns a reward r, both the first and second parametrized functions implemented by more machine-learning models. 16. The method of claim 14 wherein training the generated simulator to simulate a computing environment controlled by an automated reinforcement-learning-based application manager further comprises: receiving data collected from a computing environment controlled by an automated reinforcement-learning-based application manager, the data including action/current-state/next-state triples; and iteratively selecting a next action/current-state/next-state triple, inputting the action to the simulator, receiving an estimated next state and estimated reward from the simulator, computing a difference metric from the current state and next state, feeding the difference metric, action, and current state to the simulator, which adjusts one or more parameters of the first parametrized function to improve estimation of the next state. 17. The method of claim 16 wherein the current state and the next state are vectors containing metric and configuration elements; and wherein the distance metric is computed as the sum of terms, each term i comprising the product of a weight w i and the squared difference of the i th elements of the current state and next state. 18. The method of claim 17 wherein the weight w i is the absolute value of the i th element of a reward-function vector, where the second parameterized function uses a dot product of the reward-function vector and a state vector to estimate the reward r corresponding to the state. 19. The method of claim 16 wherein the simulator learns the parameter values for the second parameterized function during training by optimizing the second parameterized function to produce rewards that would produce the action/curre

Assignees

Inventors

Classifications

  • G06F30/27Primary

    using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Classification techniques · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Reinforcement learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11238372B2 cover?
The current document is directed to methods and systems for simulation-based training of automated reinforcement-learning-based application managers. Simulators are generated from data collected from controlled computing environments controlled and may employ any of a variety of different machine-learning models to learn state-transition and reward models. The current disclosed methods and syst…
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F30/27. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).