Data-efficient hierarchical reinforcement learning
US-2021187733-A1 · Jun 24, 2021 · US
US11443229B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11443229-B2 |
| Application number | US-201816120111-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 31, 2018 |
| Priority date | Aug 31, 2018 |
| Publication date | Sep 13, 2022 |
| Grant date | Sep 13, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for teaching an artificial intelligent agent includes giving the agent several examples where it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples to demonstrate a goal configuration. Once the agent has learned certain goal configurations, the agent can learn an option to achieve the goal configuration and a distance function that predicts at least one of a distance and a duration to the goal configuration under the learned option. This distance function prediction may be incorporated as a state feature of the agent.
Opening claim text (preview).
What is claimed is: 1. A method for training an artificial intelligent agent, comprising: defining, within the agent, a first continual learning block to include a first skill to achieve a first goal configuration for the agent and a first knowledge feature providing a first prediction of at least one of a distance and duration to achieve the first goal configuration; using the first skill to move the agent in the first goal configuration; defining, within the agent, a second continual learning block, including a second goal configuration, distinct from the first goal configuration, and a second knowledge feature providing a second prediction of at least one of a distance and duration to achieve the second goal configuration, wherein the second continual learning block builds upon the first continual learning block, and using the first prediction by the second continual learning block to move the agent to the second goal configuration. 2. The method of claim 1 , further comprising: using features of the first goal configuration for achievement of the second goal configuration. 3. The method of claim 1 , wherein the first knowledge feature is a value function based on the first goal configuration as a termination condition. 4. The method of claim 1 , further comprising: providing positive examples via an interface to the agent when the agent is in the first goal configuration; providing negative examples via the interface to the agent when the agent is not in the first goal configuration; and extracting key state features to determine what features are important during receipt of positive examples to the agent. 5. The method of claim 1 , further comprising incorporating the first prediction as a state feature of the agent. 6. The method of claim 1 , wherein the first knowledge feature is selected from the group consisting of a distance function, a time to completion, a time to initiation of something else, and a prediction of a value of a feature at the time of completion. 7. The method of claim 1 , wherein the first knowledge feature is learned, either before, in conjunction with, interleaved with, or after a policy. 8. A method of learning to achieve a goal configuration of an artificial agent, comprising: defining, within the agent, the goal configuration for the agent as part of a continual learning block; determining a knowledge feature as a prediction of at least one of a distance and duration required to achieve the goal configuration; relying on a previous learned continual learning block, having a previously learned distinct goal configuration, to move the agent in the goal configuration; determining a first knowledge feature as a first prediction of a number of steps required to achieve the goal configuration; and relying on a previous knowledge feature to achieve the goal configuration, the previous knowledge feature being a previous prediction of at least one of a distance and duration required to achieve the previous learned goal configuration. 9. The method of claim 8 , wherein a previous knowledge feature is used to achieve the goal configuration, the previous knowledge feature being a previous prediction of at least one of a distance and duration required to achieve the previous learned goal configuration. 10. The method of claim 8 , wherein the previous learned goal configuration is an element of a previous continual learning block. 11. The method of claim 10 , wherein the previous continual learning block includes a plurality of previous continual learning blocks, each having a respective previous learned goal configuration and a respective previous knowledge feature. 12. The method of claim 11 , further comprising planning ahead, by the agent, to determine how to most efficiently achieve the respective previous learned goal configurations in order to achieve the goal configuration. 13. The method of claim 8 , wherein the first knowledge feature is selected from the group consisting of a distance function, a time to completion, a time to initiation of something else, and a prediction of a value of a feature at the time of completion. 14. A method of learning to achieve a goal configuration of an artificial agent, comprising: defining, within the agent, the goal configuration for the agent as part of a continual learning block; determining, within the agent, a knowledge feature as a prediction of at least one of a duration and a distance required to achieve the goal configuration, the knowledge feature being a component of the continual learning block; and relying, by the agent, on a previous learned distinct goal configuration, of a previously learned continual learning block, to move the agent in the goal configuration; wherein a previous knowledge feature is used to achieve the goal configuration, wherein the previous knowledge feature is a previous prediction of at least one of a duration and a distance required to achieve the previous learned goal configuration, and wherein the previous knowledge feature, along with the previous goal configuration, are components of a previous continual learning block. 15. The method of claim 14 , wherein the previous continual learning block includes a plurality of previous continual learning blocks, each having a respective previous learned goal configuration and a respective previous knowledge feature. 16. The method of claim 15 , wherein each of the plurality of the previous continual learning blocks are relied upon to achieve the goal configuration. 17. The method of claim 16 , further comprising planning ahead, by the agent, to determine how to most efficiently achieve the respective previous learned goal configurations in order to achieve the goal configuration.
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
Ensemble learning · CPC title
Distributed expert systems; Blackboards · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.