Method and system for continual learning in an intelligent artificial agent

US11443229B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11443229-B2
Application numberUS-201816120111-A
CountryUS
Kind codeB2
Filing dateAug 31, 2018
Priority dateAug 31, 2018
Publication dateSep 13, 2022
Grant dateSep 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for teaching an artificial intelligent agent includes giving the agent several examples where it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples to demonstrate a goal configuration. Once the agent has learned certain goal configurations, the agent can learn an option to achieve the goal configuration and a distance function that predicts at least one of a distance and a duration to the goal configuration under the learned option. This distance function prediction may be incorporated as a state feature of the agent.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training an artificial intelligent agent, comprising: defining, within the agent, a first continual learning block to include a first skill to achieve a first goal configuration for the agent and a first knowledge feature providing a first prediction of at least one of a distance and duration to achieve the first goal configuration; using the first skill to move the agent in the first goal configuration; defining, within the agent, a second continual learning block, including a second goal configuration, distinct from the first goal configuration, and a second knowledge feature providing a second prediction of at least one of a distance and duration to achieve the second goal configuration, wherein the second continual learning block builds upon the first continual learning block, and using the first prediction by the second continual learning block to move the agent to the second goal configuration. 2. The method of claim 1 , further comprising: using features of the first goal configuration for achievement of the second goal configuration. 3. The method of claim 1 , wherein the first knowledge feature is a value function based on the first goal configuration as a termination condition. 4. The method of claim 1 , further comprising: providing positive examples via an interface to the agent when the agent is in the first goal configuration; providing negative examples via the interface to the agent when the agent is not in the first goal configuration; and extracting key state features to determine what features are important during receipt of positive examples to the agent. 5. The method of claim 1 , further comprising incorporating the first prediction as a state feature of the agent. 6. The method of claim 1 , wherein the first knowledge feature is selected from the group consisting of a distance function, a time to completion, a time to initiation of something else, and a prediction of a value of a feature at the time of completion. 7. The method of claim 1 , wherein the first knowledge feature is learned, either before, in conjunction with, interleaved with, or after a policy. 8. A method of learning to achieve a goal configuration of an artificial agent, comprising: defining, within the agent, the goal configuration for the agent as part of a continual learning block; determining a knowledge feature as a prediction of at least one of a distance and duration required to achieve the goal configuration; relying on a previous learned continual learning block, having a previously learned distinct goal configuration, to move the agent in the goal configuration; determining a first knowledge feature as a first prediction of a number of steps required to achieve the goal configuration; and relying on a previous knowledge feature to achieve the goal configuration, the previous knowledge feature being a previous prediction of at least one of a distance and duration required to achieve the previous learned goal configuration. 9. The method of claim 8 , wherein a previous knowledge feature is used to achieve the goal configuration, the previous knowledge feature being a previous prediction of at least one of a distance and duration required to achieve the previous learned goal configuration. 10. The method of claim 8 , wherein the previous learned goal configuration is an element of a previous continual learning block. 11. The method of claim 10 , wherein the previous continual learning block includes a plurality of previous continual learning blocks, each having a respective previous learned goal configuration and a respective previous knowledge feature. 12. The method of claim 11 , further comprising planning ahead, by the agent, to determine how to most efficiently achieve the respective previous learned goal configurations in order to achieve the goal configuration. 13. The method of claim 8 , wherein the first knowledge feature is selected from the group consisting of a distance function, a time to completion, a time to initiation of something else, and a prediction of a value of a feature at the time of completion. 14. A method of learning to achieve a goal configuration of an artificial agent, comprising: defining, within the agent, the goal configuration for the agent as part of a continual learning block; determining, within the agent, a knowledge feature as a prediction of at least one of a duration and a distance required to achieve the goal configuration, the knowledge feature being a component of the continual learning block; and relying, by the agent, on a previous learned distinct goal configuration, of a previously learned continual learning block, to move the agent in the goal configuration; wherein a previous knowledge feature is used to achieve the goal configuration, wherein the previous knowledge feature is a previous prediction of at least one of a duration and a distance required to achieve the previous learned goal configuration, and wherein the previous knowledge feature, along with the previous goal configuration, are components of a previous continual learning block. 15. The method of claim 14 , wherein the previous continual learning block includes a plurality of previous continual learning blocks, each having a respective previous learned goal configuration and a respective previous knowledge feature. 16. The method of claim 15 , wherein each of the plurality of the previous continual learning blocks are relied upon to achieve the goal configuration. 17. The method of claim 16 , further comprising planning ahead, by the agent, to determine how to most efficiently achieve the respective previous learned goal configurations in order to achieve the goal configuration.

Assignees

Inventors

Classifications

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Non-supervised learning, e.g. competitive learning · CPC title

  • G06N20/20Primary

    Ensemble learning · CPC title

  • Distributed expert systems; Blackboards · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11443229B2 cover?
A method and system for teaching an artificial intelligent agent includes giving the agent several examples where it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples …
Who is the assignee on this patent?
Sony Corp, Sony Corp America, Sony Group Corp
What technology area does this patent fall under?
Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).