Safe-operation-constrained reinforcement-learning-based application manager

US11042640B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11042640-B2
Application numberUS-201916502587-A
CountryUS
Kind codeB2
Filing dateJul 3, 2019
Priority dateAug 27, 2018
Publication dateJun 22, 2021
Grant dateJun 22, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The current document is directed to a safe-operation-constrained reinforcement-learning-based application manager that can be deployed in various different computational environments, without extensive manual modification and interface development, to manage the computational environments with respect to one or more reward-specified goals. Control actions undertaken by the safe-operation-constrained reinforcement-learning-based application manager are constrained, by stored action filters, to constrain state/action-space exploration by the safe-operation-constrained reinforcement-learning-based application manager to safe actions and thus prevent deleterious impact to the managed computational environment.

First claim

Opening claim text (preview).

The invention claimed is: 1. A safe-operation-constrained reinforcement-learning-based application manager that manages one or more applications and a computing environment, within which the applications run, comprising one or more of a distributed computing system having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, the modular reinforcement-learning based application manager comprising: a safe-operation-constrained reinforcement-learning-based application manager that receives rewards and observations from the computing environment and issues actions, indicated by an internally maintained policy π, to the computing environment; and one or more filtering subsystems that apply one or more filters to actions indicated by an internally maintained policy π to prevent the safe-operation-constrained reinforcement-learning-based application manager from issuing actions that, if executed by the computing environment, would lead to harmful and undesired results. 2. The safe-operation-constrained reinforcement-learning-based application manager of claim 1 wherein each action is represented as a vector of values and specifies one or more actions to be carried out by the computing environment; and wherein the observations are represented as a vector of values that include metric values, configurations parameters, operational parameters, operation characteristics, and other values indicative of the current application and computing-environment state. 3. The safe-operation-constrained reinforcement-learning-based application manager of claim 2 wherein the safe-operation-constrained reinforcement-learning-based application manager maintains: the policy π; a current belief distribution b; an action-value-update function; a belief-distribution-update function; and termination conditions. 4. The safe-operation-constrained reinforcement-learning-based application manager of claim 2 wherein the safe-operation-constrained reinforcement-learning-based application manager: continuously receives a reward and an observation vector from the computing environment; determines a new belief distribution b′ using the belief-distribution-update function and observation vector; generates a next action a′ by applying the policy π to the new belief distribution b′; applies one or more filter subsystems to the next action a′; and delivers the next action a′ to the computing environment. 5. The safe-operation-constrained reinforcement-learning-based application manager of claim 1 wherein the one or more filtering subsystems each comprises one or more filter stacks; and wherein a filter stack comprises multiple filters. 6. The safe-operation-constrained reinforcement-learning-based application manager of claim 5 wherein a filter receives an input action vector or an input action vector and an observation prediction and returns one of the input action vector, a modified version of the input action vector, or a NULL action vector. 7. The safe-operation-constrained reinforcement-learning-based application manager of claim 6 wherein a first type of filter contains logic that analyzes an input action vector to return the input action vector when the action vector represents a safe action; and when the input action vector represents an unsafe or deleterious action, when the input action vector can be modified to represent a related, safe action, modifies the input action vector and returns the modified action vector, and otherwise returns a NULL action vector. 8. The safe-operation-constrained reinforcement-learning-based application manager of claim 6 wherein a second type of filter contains logic that analyzes an input action vector and an observation prediction to return the input action vector when the action vector represents a safe action; and when the input action vector represents an unsafe or deleterious action, when the input action vector can be modified to represent a related, safe action, modifies the input action vector and returns the modified action vector, and otherwise returns a NULL action vector. 9. The safe-operation-constrained reinforcement-learning-based application manager of claim 5 wherein a filter stack applies the first filter in the filter stack to an input action vector; successively applies each remaining filter to the vector output from the preceding stack, short-circuiting successive application of the remaining filters when the preceding filter outputs a NULL vector; and returns either a NULL action vector, the input action vector, or a modified action vector. 10. The safe-operation-constrained reinforcement-learning-based application manager of claim 5 wherein a filtering subsystem receives input comprising one of an input action vector and an observation prediction; determines a filter stack to which to direct the received input; directs the input to the determined filter stack; receives an output from the filter stack; and when the input is determined to require additional processing, repeats filter-stack determination to determine a next filter stack and directs the output to the next filter stack to generate a next output, and otherwise returns the output. 11. A method constraining a reinforcement-learning-based application manager to issue safe actions, the method comprising: including, in the reinforcement-learning-based application manager that manages one or more applications and a computing environment, within which the applications run, comprising one or more of a distributed computing system having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, one or more action filtering subsystems that apply one or more filters to actions indicated by a policy π internally maintained by the reinforcement-learning-based application manager; and applying, by the reinforcement-learning-based application manager, actions, indicated by an internally maintained policy π, to one or more action filtering subsystems. 12. The method of claim 11 wherein each action is represented as a vector of values and specifies one or more actions to be carried out by the computing environment; and wherein the observations are represented as a vector of values that include metric values, configurations parameters, operational parameters, operation characteristics, and other values indicative of the current application and computing-environment state. 13. The method of claim 12 wherein the reinforcement-learning-based application manager maintains: the policy π; a current belief distribution b; an action-value-update function; a belief-distribution-update function; and termination conditions. 14. The method of claim 13 wherein the reinforcement-learning-based application manager: continuously receives a reward and an observation vector from the computing environment; determines a new belief distribution b′ using the belief-distribution-update function and observation vector; generates a next action a′ by applying the policy π to the new belief distribution b′; applies one or more filter subsystems to the next action a′; and delivers the next action a′ to the computing environment. 15. The method of claim 11 wherein the one or more filtering subsystems each comprises one or more filter stacks; and wherein a filter stack comprises multiple filters. 16. The method of claim 15 wherein a filter receives an input action vector or an input action vector and

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06F21/604Primary

    Tools and structures for managing or administering access control systems · CPC title

  • Test or assess a computer or a system · CPC title

  • Machine learning · CPC title

  • G06F21/57Primary

    Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11042640B2 cover?
The current document is directed to a safe-operation-constrained reinforcement-learning-based application manager that can be deployed in various different computational environments, without extensive manual modification and interface development, to manage the computational environments with respect to one or more reward-specified goals. Control actions undertaken by the safe-operation-constr…
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/604. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 22 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).