What technology area does this patent fall under?

Primary CPC classification G06F21/604. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 22 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Safe-operation-constrained reinforcement-learning-based application manager

US11042640B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11042640-B2
Application number	US-201916502587-A
Country	US
Kind code	B2
Filing date	Jul 3, 2019
Priority date	Aug 27, 2018
Publication date	Jun 22, 2021
Grant date	Jun 22, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The current document is directed to a safe-operation-constrained reinforcement-learning-based application manager that can be deployed in various different computational environments, without extensive manual modification and interface development, to manage the computational environments with respect to one or more reward-specified goals. Control actions undertaken by the safe-operation-constrained reinforcement-learning-based application manager are constrained, by stored action filters, to constrain state/action-space exploration by the safe-operation-constrained reinforcement-learning-based application manager to safe actions and thus prevent deleterious impact to the managed computational environment.

First claim

Opening claim text (preview).

The invention claimed is: 1. A safe-operation-constrained reinforcement-learning-based application manager that manages one or more applications and a computing environment, within which the applications run, comprising one or more of a distributed computing system having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, the modular reinforcement-learning based application manager comprising: a safe-operation-constrained reinforcement-learning-based application manager that receives rewards and observations from the computing environment and issues actions, indicated by an internally maintained policy π, to the computing environment; and one or more filtering subsystems that apply one or more filters to actions indicated by an internally maintained policy π to prevent the safe-operation-constrained reinforcement-learning-based application manager from issuing actions that, if executed by the computing environment, would lead to harmful and undesired results. 2. The safe-operation-constrained reinforcement-learning-based application manager of claim 1 wherein each action is represented as a vector of values and specifies one or more actions to be carried out by the computing environment; and wherein the observations are represented as a vector of values that include metric values, configurations parameters, operational parameters, operation characteristics, and other values indicative of the current application and computing-environment state. 3. The safe-operation-constrained reinforcement-learning-based application manager of claim 2 wherein the safe-operation-constrained reinforcement-learning-based application manager maintains: the policy π; a current belief distribution b; an action-value-update function; a belief-distribution-update function; and termination conditions. 4. The safe-operation-constrained reinforcement-learning-based application manager of claim 2 wherein the safe-operation-constrained reinforcement-learning-based application manager: continuously receives a reward and an observation vector from the computing environment; determines a new belief distribution b′ using the belief-distribution-update function and observation vector; generates a next action a′ by applying the policy π to the new belief distribution b′; applies one or more filter subsystems to the next action a′; and delivers the next action a′ to the computing environment. 5. The safe-operation-constrained reinforcement-learning-based application manager of claim 1 wherein the one or more filtering subsystems each comprises one or more filter stacks; and wherein a filter stack comprises multiple filters. 6. The safe-operation-constrained reinforcement-learning-based application manager of claim 5 wherein a filter receives an input action vector or an input action vector and an observation prediction and returns one of the input action vector, a modified version of the input action vector, or a NULL action vector. 7. The safe-operation-constrained reinforcement-learning-based application manager of claim 6 wherein a first type of filter contains logic that analyzes an input action vector to return the input action vector when the action vector represents a safe action; and when the input action vector represents an unsafe or deleterious action, when the input action vector can be modified to represent a related, safe action, modifies the input action vector and returns the modified action vector, and otherwise returns a NULL action vector. 8. The safe-operation-constrained reinforcement-learning-based application manager of claim 6 wherein a second type of filter contains logic that analyzes an input action vector and an observation prediction to return the input action vector when the action vector represents a safe action; and when the input action vector represents an unsafe or deleterious action, when the input action vector can be modified to represent a related, safe action, modifies the input action vector and returns the modified action vector, and otherwise returns a NULL action vector. 9. The safe-operation-constrained reinforcement-learning-based application manager of claim 5 wherein a filter stack applies the first filter in the filter stack to an input action vector; successively applies each remaining filter to the vector output from the preceding stack, short-circuiting successive application of the remaining filters when the preceding filter outputs a NULL vector; and returns either a NULL action vector, the input action vector, or a modified action vector. 10. The safe-operation-constrained reinforcement-learning-based application manager of claim 5 wherein a filtering subsystem receives input comprising one of an input action vector and an observation prediction; determines a filter stack to which to direct the received input; directs the input to the determined filter stack; receives an output from the filter stack; and when the input is determined to require additional processing, repeats filter-stack determination to determine a next filter stack and directs the output to the next filter stack to generate a next output, and otherwise returns the output. 11. A method constraining a reinforcement-learning-based application manager to issue safe actions, the method comprising: including, in the reinforcement-learning-based application manager that manages one or more applications and a computing environment, within which the applications run, comprising one or more of a distributed computing system having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, one or more action filtering subsystems that apply one or more filters to actions indicated by a policy π internally maintained by the reinforcement-learning-based application manager; and applying, by the reinforcement-learning-based application manager, actions, indicated by an internally maintained policy π, to one or more action filtering subsystems. 12. The method of claim 11 wherein each action is represented as a vector of values and specifies one or more actions to be carried out by the computing environment; and wherein the observations are represented as a vector of values that include metric values, configurations parameters, operational parameters, operation characteristics, and other values indicative of the current application and computing-environment state. 13. The method of claim 12 wherein the reinforcement-learning-based application manager maintains: the policy π; a current belief distribution b; an action-value-update function; a belief-distribution-update function; and termination conditions. 14. The method of claim 13 wherein the reinforcement-learning-based application manager: continuously receives a reward and an observation vector from the computing environment; determines a new belief distribution b′ using the belief-distribution-update function and observation vector; generates a next action a′ by applying the policy π to the new belief distribution b′; applies one or more filter subsystems to the next action a′; and delivers the next action a′ to the computing environment. 15. The method of claim 11 wherein the one or more filtering subsystems each comprises one or more filter stacks; and wherein a filter stack comprises multiple filters. 16. The method of claim 15 wherein a filter receives an input action vector or an input action vector and

Assignees

Vmware Inc

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06F21/604Primary
Tools and structures for managing or administering access control systems · CPC title
G06F2221/034
Test or assess a computer or a system · CPC title
G06N20/00
Machine learning · CPC title
G06F21/57Primary
Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities · CPC title

Patent family

Related publications grouped by family.

View patent family 69583910

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11042640B2 cover?: The current document is directed to a safe-operation-constrained reinforcement-learning-based application manager that can be deployed in various different computational environments, without extensive manual modification and interface development, to manage the computational environments with respect to one or more reward-specified goals. Control actions undertaken by the safe-operation-constr…
Who is the assignee on this patent?: Vmware Inc
What technology area does this patent fall under?: Primary CPC classification G06F21/604. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 22 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Interference mitigation in ultra-dense wireless networks

Methods and systems for reinforcement learning

Systems and methods for providing information incorporating reinforcement-based learning and feedback

Hybrid reward architecture for reinforcement learning

Wireless coded communication (WCC) devices for tracking retail interactions with goods and association to user accounts

Inverse reinforcement learning by density ratio estimation

Frequently asked questions