Synthetic training examples from advice for training autonomous agents

US11568246B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11568246-B2
Application numberUS-202016810324-A
CountryUS
Kind codeB2
Filing dateMar 5, 2020
Priority dateMay 9, 2019
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for training a machine learning model to perform actions within an environment. In one example, an input device receives a declarative statement. A computation engine selects, based on the declarative statement, a template that includes a template action performable within the environment. The computation engine generates, based on the template, synthetic training episodes. The computation engine further generates experiential training episodes, each experiential training episode collected by a machine learning model from past actions performed by the machine learning model. Each synthetic training episode and experiential training episode comprises an action and a reward. A machine learning system trains, with the synthetic training episodes and the experiential training episodes, the machine learning model to perform the actions within the environment.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: an input device configured to receive a declarative statement; a computation engine comprising processing circuitry, wherein the computation engine is configured to select, based on the declarative statement, a template that includes at least one template action that can be performed within an environment, wherein the computation engine is configured to generate, based on the template, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward; and a reinforcement learning system configured to train, with the one or more synthetic training episodes, a reinforcement learning model to perform one or more actions within the environment. 2. The computing system of claim 1 , wherein the declarative statement specifies one or more key elements, wherein the template further comprises one or more variables, wherein the at least one template action comprises a sequence of interactions between the one or more variables, wherein to generate, based on the template, the one or more synthetic training episodes, the computation engine is configured to: resolve the one or more variables with the one or more key elements of the declarative statement; and define the sequence of interactions with the resolved one or more variables to generate the one or more synthetic training episodes. 3. The computing system of claim 1 , wherein to generate, based on the template, the one or more synthetic training episodes, the computation engine is configured to: generate, based on the template, one or more preliminary synthetic training episodes; and apply, to the one or more preliminary synthetic training episodes, saliency masking to remove extraneous information from the one or more preliminary synthetic training episodes to generate the one or more synthetic training episodes. 4. The computing system of claim 3 , wherein the machine learning system is a reinforcement learning system. 5. The computing system of claim 1 , wherein the reinforcement learning model is a Deep Q-Network (DQN), wherein each of the one or more synthetic training episodes comprises a tuple, and wherein to train, with the one or more synthetic training episodes, the reinforcement learning model to perform the one or more actions within the environment, the reinforcement learning system is further configured to update one or more Q-value network parameters of the DQN with one or more tuples of the one or more synthetic training episodes. 6. The computing system of claim 1 , wherein the declarative statement defines at least one or one or more constraints on desirable behavior for the reinforcement learning model. 7. The computing system of claim 1 , wherein the input device is configured to receive the declarative statement from a human user. 8. A computing system comprising: an input device configured to receive a declarative statement; a machine learning system comprising a machine learning model; and a computation engine comprising processing circuitry, wherein the computation engine is configured to select, based on the declarative statement, a template that includes at least one template action that can be performed within an environment, wherein the computation engine is configured to generate, based on the template, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward, wherein the computation engine is further configured to generate one or more experiential training episodes, wherein each experiential training episode comprises at least one action and at least one reward and wherein each experiential training episode is collected by the machine learning model from past actions performed by the machine learning model; and wherein the machine learning system configured to train, with the one or more synthetic training episodes and the one or more experiential training episodes, a machine learning model to perform one or more actions within the environment. 9. The computing system of claim 1 , wherein the one or more actions comprise at least one of: 1) a task to navigate an autonomous vehicle through the environment; 2) a task to move an avatar within an artificial reality environment; or 3) a task to configure a computer or applications. 10. The computing system of claim 8 , wherein the computing system further comprises an experiential episode replay buffer configured to store each of the experiential training episodes as an experiential tuple, and wherein each experiential tuple defining the respective experiential training episode comprises a historical initial state of the environment, a historical action performed by the machine learning model, a historical resulting state of the environment, and a historical resulting reward for the machine learning model. 11. The computing system of claim 10 , wherein the computing system further comprises a synthetic episode replay buffer configured to store each of the synthetic training episodes as a synthetic tuple, and wherein each synthetic tuple defining the respective synthetic training episode comprises a synthetic initial state of the environment, a synthetic action performed by the machine learning model, a synthetic resulting state of the environment, and a synthetic resulting reward for the machine learning model. 12. The computing system of claim 8 , wherein to train the machine learning model to perform the one or more actions within the environment, the machine learning system is further configured to adapt between training the machine learning model with a synthetic training episode of the one or more synthetic training episodes and training the machine learning model with an experiential episode of the one or more experiential training episodes based on one or more parameters of the environment. 13. A computing system comprising: an input device configured to receive a declarative statement; a computation engine comprising processing circuitry, wherein the computation engine is configured to select, based on the declarative statement, a template that includes at least one template action that can be performed within an environment, wherein the computation engine is configured to generate, based on the template and a domain-specific action model for the environment, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward; and a machine learning system configured to train, with the one or more synthetic training episodes, a machine learning model to perform one or more actions within the environment. 14. A method for training a reinforcement learning model to perform one or more actions within an environment, the method comprising: receiving, by an input device, a declarative statement; selecting, by a computation engine comprising processing circuitry and based on the declarative statement, a template that includes at least one template action that can be performed within the environment; generating, by the computation engine and based on the template, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward; and training, by a reinforcement learning system and with the one or more synthetic training episodes, the reinforcement learning model to perform the one or more actions within the environment. 15. The method of claim 14 , wherein the declarative statement specifies one or more key elements, wherein the temp

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Reinforcement learning · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568246B2 cover?
Techniques are disclosed for training a machine learning model to perform actions within an environment. In one example, an input device receives a declarative statement. A computation engine selects, based on the declarative statement, a template that includes a template action performable within the environment. The computation engine generates, based on the template, synthetic training episo…
Who is the assignee on this patent?
Stanford Res Inst Int
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).