What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Adversarial automated reinforcement-learning-based application-manager training

US10977579B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10977579-B2
Application number	US-201916518807-A
Country	US
Kind code	B2
Filing date	Jul 22, 2019
Priority date	Aug 27, 2018
Publication date	Apr 13, 2021
Grant date	Apr 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The current document is directed to automated reinforcement-learning-based application managers that that are trained using adversarial training. During adversarial training, potentially disadvantageous next actions are selected for issuance by an automated reinforcement-learning-based application manager at a lower frequency than selection of next actions, according to a policy that is learned to provide optimal or near-optimal control over a computing environment that includes one or more applications controlled by the automated reinforcement-learning-based application manager. By selecting disadvantageous actions, the automated reinforcement-learning-based application manager is forced to explore a much larger subset of the system-state space during training, so that, upon completion of training, the automated reinforcement-learning-based application manager has learned a more robust and complete optimal or near-optimal control policy than had the automated reinforcement-learning-based application manager been trained by simulators or using management actions and computing-environment responses recorded during previous controlled operation of a computing-environment.

First claim

Opening claim text (preview).

The invention claimed is: 1. An automated reinforcement-learning-based application manager that manages a computing environment that includes one or more applications and one or more of a distributed computing system having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, the reinforcement-learning based application manager comprising: one or more processors, one or more memories, and one or more communications subsystems; a set of actions A that can be issued to the computing environment; an iterative control process that repeatedly when adversarial training is not occurring, selects and issues a next action to the computing environment according to a positive control policy that uses a state vector that represents a current state of the computational environment, when adversarial training is occurring, selects and issues, at a first frequency, a next action to the computing environment according to the positive control policy, and selects and issues at a second frequency less than the first frequency, a next action to the computing environment according to a negative control policy, and receives, from the computing environment, a next state and a reward, which the control process uses to attempt to learn an optimal or near-optimal control policy. 2. The automated reinforcement-learning-based application manager of claim 1 further including a second set of actions B from which the negative control policy selects a next action. 3. The automated reinforcement-learning-based application manager of claim 2 wherein the negative control policy selects actions from either the set of actions B or from the set of actions A. 4. The automated reinforcement-learning-based application manager of claim 1 wherein, when adversarial training is occurring and a next action a′ is selected according to the negative control policy, actions complementary to the next action a′ are temporarily removed from the set of actions A so that the automated reinforcement-learning-based application manager cannot immediately reverse the effects of action a′ in subsequent iterative-control-process cycles. 5. The automated reinforcement-learning-based application manager of claim 1 wherein the positive control policy attempts to select a next action that causes a transition to a next state with a maximum possible value. 6. The automated reinforcement-learning-based application manager of claim 1 wherein the positive control policy attempts to select a next action that causes a transition to a next state most likely to result in a maximum cumulative reward over subsequent iterative-control-process cycles. 7. The automated reinforcement-learning-based application manager of claim 1 wherein the negative control policy attempts to select a next action that causes a transition to a next state with a minimum possible value. 8. The automated reinforcement-learning-based application manager of claim 1 wherein the negative control policy attempts to select a next action that causes a transition to a next state most likely to result in a minimum cumulative reward over subsequent iterative-control-process cycles. 9. The automated reinforcement-learning-based application manager of claim 1 wherein, during adversarial training, the automated reinforcement-learning-based application manager includes two iterative control processes, one that uses the positive control policy and one that uses the negative control policy. 10. A method that trains an automated reinforcement-learning-based application manager that manages a computing environment that includes one or more applications and one or more of a distributed computing environment having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, the automated reinforcement-learning-based application manager having one or more processors, one or more memories, one or more communications subsystems, and a set of actions A that can be issued to the computing environment, the method comprising: iteratively, by an iterative control process, when adversarial training is not occurring, selecting and issuing a next action to the computing environment according to a positive control policy that uses a state vector that represents a current state of the computational environment, when adversarial training is occurring, selecting and issuing, at a first frequency, a next action to the computing environment according to the positive control policy, and selecting and issuing at a second frequency less than the first frequency, a next action to the computing environment according to a negative control policy, and receiving, from the computing environment, a next state and a reward, which the automated reinforcement-learning-based application manager uses to attempt to learn an optimal or near-optimal control policy. 11. The method of claim 10 further including a second set of actions B from which the negative control policy selects a next action. 12. The method of claim 11 wherein the negative control policy selects actions from either the set of actions B or from the set of actions A. 13. The method of claim 10 wherein, when adversarial training is occurring and a next action a′ is selected according to the negative control policy, actions complementary to the next action a′ are temporarily removed from the set of actions A so that the automated reinforcement-learning-based application manager cannot immediately reverse the effects of action a′ in subsequent iterative-control-process cycles. 14. The method of claim 10 wherein the positive control policy attempts to select a next action that causes a transition to a next state with a maximum possible value. 15. The method of claim 10 wherein the positive control policy attempts to select a next action that causes a transition to a next state most likely to result in a maximum cumulative reward over subsequent iterative-control-process cycles. 16. The method of claim 10 wherein the negative control policy attempts to select a next action that causes a transition to a next state with a minimum possible value. 17. The method of claim 10 wherein the negative control policy attempts to select a next action that causes a transition to a next state most likely to result in a minimum cumulative reward over subsequent iterative-control-process cycles. 18. The method of claim 10 wherein, during adversarial training, the automated reinforcement-learning-based application manager includes two iterative control processes, one that uses the positive control policy and one that uses the negative control policy. 19. A physical data-storage device encoded with computer instructions that, when executed by one or more processors of a computer system that implements an automated reinforcement-learning-based application manager having one or more processors, one or more memories, one or more communications subsystems, a set of actions A that can be issued to a computing environment, controls the automated reinforcement-learning-based application manager to: iteratively, by an iterative control process, when adversarial training is not occurring, selecting and issuing a next action to the computing environment according to a positive control policy that uses a state vector that represents a current state of the computational environment, when adversarial training is occurring, selecting and issuing, at a first frequency, a

Assignees

Vmware Inc

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N20/00Primary
Machine learning · CPC title
G06F9/542
Event management; Broadcasting; Multicasting; Notifications · CPC title
G06N7/005
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 69586284

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10977579B2 cover?: The current document is directed to automated reinforcement-learning-based application managers that that are trained using adversarial training. During adversarial training, potentially disadvantageous next actions are selected for issuance by an automated reinforcement-learning-based application manager at a lower frequency than selection of next actions, according to a policy that is learned…
Who is the assignee on this patent?: Vmware Inc
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).