What technology area does this patent fall under?

Primary CPC classification G06N3/006. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Transferable training for automated reinforcement-learning-based application-managers

US11037058B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11037058-B2
Application number	US-201916518831-A
Country	US
Kind code	B2
Filing date	Jul 22, 2019
Priority date	Aug 27, 2018
Publication date	Jun 15, 2021
Grant date	Jun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The current document is directed to transfer of training received by a first automated reinforcement-learning-based application manager while controlling a first application is transferred to a second automated reinforcement-learning-based application manager which controls a second application different from the first application. Transferable training provides a basis for automated generation of applications from application components. Transferable training is obtained from composition of applications from application components and composition of reinforcement-learning-based-control-and-learning constructs from reinforcement-learning-based-control-and-learning constructs of application components.

First claim

Opening claim text (preview).

The invention claimed is: 1. An automated reinforcement-learning-based application manager that manages a computing environment that includes one or more applications and one or more of a distributed computing system having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, the reinforcement-learning based application manager comprising: one or more processors, one or more memories, and one or more communications subsystems; a set of actions A that can be issued to the computing environment; and an iterative control process that repeatedly when initial training is not occurring, selects and issues a next action to the computing environment according to a control policy that uses a state vector that represents a current state of the computational environment, when initial training is occurring, selects and issues a next action to the computing environment according to a training control policy that uses a state vector that represents a current state of the computational environment and training information incorporated into the automated reinforcement-learning-based application manager that was acquired by a different automated reinforcement-learning-based application manager, and receives, from the computing environment, a next state and a reward, which the control process uses to attempt to learn an optimal or near-optimal control policy. 2. The automated reinforcement-learning-based application manager of claim 1 wherein the training control policy uses a state vector that represents a current state of the computing environment and training information incorporated into the automated reinforcement-learning-based application manager to select a next action by: generating a hidden-state vector for each of multiple components of the computing environment; applying, to each hidden-state vector, a component-associated control policy for the component of the computing environment for which the hidden-state vector was generated to select an action; and combining one or more of the actions selected by the component-associated control policies to produce the next action. 3. The automated reinforcement-learning-based application manager of claim 2 wherein the component-associated control policies include: component-associated control policies associated with components for which training data for related components has been incorporated into the automated reinforcement-learning-based application manager; static deterministic or stochastic component-associated control policies associated with components for which training data has been incorporated into the automated reinforcement-learning-based application manager; and static deterministic or stochastic component-associated control policies associated with components comprising subcomponents for which training data has been incorporated into the automated reinforcement-learning-based application manager. 4. The automated reinforcement-learning-based application manager of claim 3 wherein the component-associated control policies associated with components for which training data for related components has been incorporated into the automated reinforcement-learning-based application manager employing exploratory action selection from an action set corresponding to the component. 5. The automated reinforcement-learning-based application manager of claim 2 wherein generating a hidden-state vector for each of multiple components of the computing environment further comprises: decomposing the computing environment into components; decomposing the state vector into component subvectors, each component subvector corresponding to a computing-environment component; and applying a hidden-state-vector function to each component subvector to generate the hidden-state vector. 6. The automated reinforcement-learning-based application manager of claim 1 wherein initial training is discontinued after the automated reinforcement-learning-based application manager has learned a near-optimal or optimal control policy for the computing environment. 7. A method for transferring training data from one or more trained automated reinforcement-learning-based application managers to a target automated reinforcement-learning-based application manager that manages a computing environment that includes one or more applications and one or more of a distributed computing environment having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, the automated reinforcement-learning-based application manager having one or more processors, one or more memories, one or more communications subsystems, and a set of actions A that can be issued to the computing environment, the method comprising: decomposing the computing into components; identifying training data for each of the components; incorporating the identified training data into the target automated reinforcement-learning-based application manager; and iteratively, by an iterative control process, selecting and issuing a next action to the computing environment according to a control policy that uses a state vector that represents a current state of the computational environment and the training information incorporated into the automated reinforcement-learning-based application manager, and receiving, from the computing environment, a next state and a reward, which the control process uses to attempt to learn an optimal or near-optimal control policy. 8. The method of claim 7 wherein the control policy comprises multiple component-associated control policies, each component-associated control policy selecting actions from a set of actions issuable to the component associated with the component-associated control policy. 9. The method of claim 8 wherein selecting and issuing a next action further comprises: decomposing the state vector into subvectors, each subvector corresponding to one of the components; generating a hidden-state vector from each state vector; applying, to each hidden-state vector, a component-associated control policy; and combining one or more of the actions selected by the component-associated control policies to produce the next action. 10. The method of claim 9 wherein the component-associated control policies include: component-associated control policies associated with components for which training data for related components has been incorporated into the automated reinforcement-learning-based application manager; static deterministic or stochastic component-associated control policies associated with components for which training data has been incorporated into the automated reinforcement-learning-based application manager; and static deterministic or stochastic component-associated control policies associated with components comprising subcomponents for which training data has been incorporated into the automated reinforcement-learning-based application manager. 11. The method of claim 9 wherein the reward is computed by a functional composition of reward functions for each of the components. 12. The method of claim 9 wherein the training data comprises one or more of state-value functions and state/action-value functions. 13. A method that generates a new application for management by a target automated reinforcement-learning-based application manager that manages a computing environment that includes the new application and one or more of a distributed computing environment having multiple computer systems interconnected by one or more networks, a stan

Assignees

Vmware Inc

Inventors

Classifications

G06N3/006Primary
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/047
Probabilistic or stochastic networks · CPC title
G06F18/2155
characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

View patent family 69583748

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11037058B2 cover?: The current document is directed to transfer of training received by a first automated reinforcement-learning-based application manager while controlling a first application is transferred to a second automated reinforcement-learning-based application manager which controls a second application different from the first application. Transferable training provides a basis for automated generation…
Who is the assignee on this patent?: Vmware Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/006. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Modular reinforcement-learning-based application manager

Automated reinforcement-learning-based application manager that learns and improves a reward function

Computationally efficient reinforcement-learning-based application manager

Adversarial automated reinforcement-learning-based application-manager training

Automated reinforcement-learning-based application manager that uses local agents

Frequently asked questions