Method and system for an intelligent artificial agent

US12354027B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12354027-B2
Application numberUS-201815943947-A
CountryUS
Kind codeB2
Filing dateApr 3, 2018
Priority dateApr 3, 2018
Publication dateJul 8, 2025
Grant dateJul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for teaching an artificial intelligent agent where the agent can be placed in a state that it would like it to learn how to achieve. By giving the agent several examples, it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal states on its own. An agent may be provided with positive and negative examples to demonstrate a goal configuration. Once the agent has learned certain goal configurations, the agent can learn policies and skills that achieve the learned goal configuration. The agent may create a collection of these policies and skills from which to select based on a particular command or state.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training an artificial intelligent agent to recognize a goal configuration, comprising: placing the agent in the goal configuration and identifying a resulting state as a positive example; providing negative examples to the agent that demonstrate the agent in a state failing to achieve the goal configuration; extracting key state features when the agent is in the goal configuration, the key state features including at least one of a room feature, object positioning, ambient lighting, and ambient sounds; determining what feature categories are important in the goal configuration during receipt of positive examples to the agent; learning and recognizing, by the agent, the goal configuration based on the extracted key state features and the determined important feature categories; creating policies, by the agent, based on the learned goal configuration; converting state features into a distance function to determine how far the agent is from the goal configuration; using goal detection as a final reward; and using a goal distance as an intermediate reward. 2. The method of claim 1 , wherein an interface is used to indicate an example as being either the positive example or the negative example, the interface includes at least one of a spoken word received by the agent, an electronic signal received from a computing device, and a physical button on the agent. 3. The method of claim 1 , wherein the step of extracting key state features includes looking for similarity in state features in each of the positive and negative examples. 4. The method of claim 3 , further comprising increasing a confidence of the agent as the positive and negative examples are received by the agent. 5. The method of claim 4 , wherein the agent takes an action upon reaching a predetermined level of confidence. 6. The method of claim 1 , wherein the key state features are weighted according to a predetermined weight value. 7. The method of claim 1 , further comprising asking, by the agent, for human feedback regarding whether the agent is in a goal state. 8. A system comprising a processor and a computer-usable medium embodying a computer program code, the computer program code comprising instructions executable by the processor and configured to provide a method of learning to recognize a goal configuration of an artificial agent, the method comprising: placing the agent in the goal configuration and identifying a resulting state as a positive example; providing negative examples to the agent that demonstrate the agent in a state failing to achieve the goal configuration; extracting key state features when the agent is in the goal configuration, the key state features including at least one of a room feature, object positioning, ambient lighting, and ambient sounds; determining what feature categories are important in the goal configuration during receipt of positive examples to the agent; learning and recognizing, by the agent, the goal configuration based on the extracted key state features and the determined important feature categories; creating policies, by the agent, based on the learned goal configuration; converting state features into a distance function to determine how far the agent is from the goal configuration; using the distance function as an intermediate reward for the agent; and using goal detection as a final reward. 9. The system of claim 8 , wherein the method further comprises recognizing whether the agent is in an initialization state. 10. The system of claim 8 , wherein the method further comprises self-practice by the agent. 11. The system of claim 10 , wherein a selected goal configuration for self-practice is selected based on at least one of a random determination, which goal configuration needs the most improvement, which goal configuration is most likely to improve, which goal configuration has been used least recently, and which goal configuration is most used. 12. The system of claim 10 , wherein the method further comprises biasing action choices based on which actions are important to achieving the goal configuration. 13. The system of claim 8 , wherein the method further comprises updating a policy for achieving a goal configuration based on performance of the agent. 14. A non-transitory computer-readable storage medium with an executable program stored thereon, wherein the program instructs one or more processors to perform the following steps to cause an agent to recognize and learn a goal configuration: placing the agent in the goal configuration and identifying a resulting state as a positive example; providing negative examples to the agent that demonstrate the agent in a state failing to achieve the goal configuration; extracting key state features when the agent is in the goal configuration, the key state features including at least one of a room feature, object positioning, ambient lighting, and ambient sounds; determining what feature categories are important in the goal configuration during receipt of positive examples to the agent; learning and recognizing, by the agent, the goal configuration based on the extracted key state features and the determined important feature categories; creating policies, by the agent, based on the learned goal configuration; converting state features into a distance function to determine how far the agent is from the goal configuration; using the distance function as an intermediate reward for the agent; and using goal detection as a final reward. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the step of extracting key state features includes looking for similarity of the key state features in each of the positive and negative examples. 16. The non-transitory computer-readable storage medium of claim 14 , wherein the program instructs one or more processors to perform the following steps: increasing a confidence of the agent as additional ones of the positive and negative examples are received by the agent; and taking an action by the agent upon reaching a predetermined level of confidence.

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06N5/043Primary

    Distributed expert systems; Blackboards · CPC title

  • G06N3/006Primary

    based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12354027B2 cover?
A method and system for teaching an artificial intelligent agent where the agent can be placed in a state that it would like it to learn how to achieve. By giving the agent several examples, it can learn to identify what is important about these example states. Once the agent has the ability to recognize a goal configuration, it can use that information to then learn how to achieve the goal sta…
Who is the assignee on this patent?
Sony Corp, Sony Corp America, Sony Group Corp
What technology area does this patent fall under?
Primary CPC classification G06N5/043. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).