Automated reinforcement-learning-based application manager that uses action tags and metric tags

US11080623B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11080623-B2
Application numberUS-201916518667-A
CountryUS
Kind codeB2
Filing dateJul 22, 2019
Priority dateAug 27, 2018
Publication dateAug 3, 2021
Grant dateAug 3, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The current document is directed to an automated reinforcement-learning-based application manager that uses action tags and metric tags. In various implementations, actions and metrics are associated with tags. Different types of tags can contain different types of information that can be used to greatly improve the computational efficiency by which the reinforcement-learning-based application manager explores the action-state space in order to determine and maintain an optimal or near-optimal management policy by providing a vehicle for domain knowledge to influence control-policy decision making.

First claim

Opening claim text (preview).

The invention claimed is: 1. An automated reinforcement-learning-based application manager that manages a computing environment that includes one or more applications and one or more of a distributed computing system having multiple computer systems interconnected by one or more networks, a standalone computer system, and a processor-controlled user device, the reinforcement-learning based application manager comprising: one or more processors, one or more memories, and one or more communications subsystems; a set of actions that can be issued to the computing environment; a set of metrics; tags that can be associated with actions and metrics; and an iterative control process that repeatedly selects and issues a next action, according to a control policy that uses the current computational-environment state and that accesses values of tag associated with actions and metrics, to control the computing environment, and receives a reward and one of an observation and a state from the computing environment, in response to execution of the issued next action, which the control process uses to attempt to learn an optimal or near-optimal control policy, over time. 2. The automated reinforcement-learning-based application manager of claim 1 wherein each action is represented by a set of values; and wherein representations of actions are translated into commands that are directed to computational-entity components of the computing environment. 3. The automated reinforcement-learning-based application manager of claim 2 wherein both states and observations are represented by a set of values that include metric values. 4. The automated reinforcement-learning-based application manager of claim 3 wherein tags are represented by stored data that includes data representations of the associations of tags to actions and metrics and that includes data representations of one or more values for each tag. 5. The automated reinforcement-learning-based application manager of claim 4 wherein tags are associated with metrics in order to project the set of states into a smaller subset of lower-dimensional controllable states, transitions between which are controlled directly or indirectly by actions issued by the control process, in order to lower the computational complexity of learning an optimal or near-optimal control policy; and wherein the control process learns values of the lower-dimensional controllable states and/or values of controllable-state/action pairs from the received rewards as part of learning an optimal or near-optimal control policy. 6. The automated reinforcement-learning-based application manager of claim 4 wherein tags are associated with metrics in order to project the set of states into two smaller subsets of lower-dimension, including a first subset of lower-dimensional controllable states, transitions between which are controlled directly or indirectly by actions issued by the control process, and a second subset that is further projected into a set of values, each associated with a different model; and wherein the control process learns the models as part of learning an optimal or near-optimal control policy with decreased computational overheads as a result of the smaller number of controllable states that need to be incorporated into the models. 7. The automated reinforcement-learning-based application manager of claim 4 wherein tags are associated with actions in order to decrease the number of actions from which the control process selects a next action by representing special-cases as tag values rather than separate actions, which, in turn, lowers the computational complexity and time needed to learn an optimal or near-optimal control policy. 8. The automated reinforcement-learning-based application manager of claim 4 wherein tags are associated with actions in order to prevent the control process from selecting a next action that is known to produce a low, future cumulative reward and thus improve control of the computational environment and lower the computational complexity and time needed to learn an optimal or near-optimal control policy. 9. The automated reinforcement-learning-based application manager of claim 4 wherein tags are associated with actions in order to direct the control process to select a next action that is known to produce a large future cumulative reward and thus improve control of the computational environment and lower the computational complexity and time needed to learn an optimal or near-optimal control policy. 10. The automated reinforcement-learning-based application manager of claim 4 wherein the values that represent actions, metrics, and tags include one or more value types selected from among: characters; character strings; integers; and floating-point numbers. 11. The automated reinforcement-learning-based application manager of claim 9 where tag values may include rules, logic statements, and routines that can be applied, evaluated, and executed, respectively, by the control process. 12. A method that improves the computational efficiency of an automated reinforcement-learning-based application manager having one or more processors, one or more memories, one or more communications subsystems, a set of actions that can be issued by an iterative control process to a computing environment, controlled by the automated reinforcement-learning-based application manager, having a set of metrics, the method comprising: associating tags with actions and metrics; and enhancing the iterative control process to repeatedly select and issue a next action, according to a control policy that uses the current computational-environment state and that accesses values of tags associated with actions and metrics, to control the computing environment, and receives a reward and one of an observation and a state from the computing environment, in response to execution of the issued next action, which the control process uses to attempt to learn an optimal or near-optimal control policy, over time. 13. The method of claim 12 wherein each action is represented by a set of values; wherein representations of actions are translated into commands that are directed to computational-entity components of the computing environment; and wherein both states and observations are represented by a set of values that include metric values. 14. The method of claim 13 wherein tags are represented by stored data that includes data representations of the associations of tags to actions and metrics and that includes data representations of one or more values for each tag. 15. The method of claim 14 wherein tags are associated with metrics in order to project the set of states into a smaller subset of lower-dimensional controllable states, transitions between which are controlled directly or indirectly by actions issued by the control process, in order to lower the computational complexity of learning an optimal or near-optimal control policy; and wherein the control process learns values of the lower-dimensional controllable states and/or values of controllable-state/action pairs from the received rewards as part of learning an optimal or near-optimal control policy. 16. The method of claim 14 wherein tags are associated with metrics in order to project the set of states into two smaller subsets of lower-dimension, including a first subset of lower-dimensional controllable states, transitions between which are controlled directly or indirectly by actions issued by the control process, and a second subset that is further projected int

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Knowledge representation; Symbolic representation · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11080623B2 cover?
The current document is directed to an automated reinforcement-learning-based application manager that uses action tags and metric tags. In various implementations, actions and metrics are associated with tags. Different types of tags can contain different types of information that can be used to greatly improve the computational efficiency by which the reinforcement-learning-based application …
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 03 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).