Training reinforcement machine learning systems
US-2021334696-A1 · Oct 28, 2021 · US
US11568246B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11568246-B2 |
| Application number | US-202016810324-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 5, 2020 |
| Priority date | May 9, 2019 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for training a machine learning model to perform actions within an environment. In one example, an input device receives a declarative statement. A computation engine selects, based on the declarative statement, a template that includes a template action performable within the environment. The computation engine generates, based on the template, synthetic training episodes. The computation engine further generates experiential training episodes, each experiential training episode collected by a machine learning model from past actions performed by the machine learning model. Each synthetic training episode and experiential training episode comprises an action and a reward. A machine learning system trains, with the synthetic training episodes and the experiential training episodes, the machine learning model to perform the actions within the environment.
Opening claim text (preview).
What is claimed is: 1. A computing system comprising: an input device configured to receive a declarative statement; a computation engine comprising processing circuitry, wherein the computation engine is configured to select, based on the declarative statement, a template that includes at least one template action that can be performed within an environment, wherein the computation engine is configured to generate, based on the template, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward; and a reinforcement learning system configured to train, with the one or more synthetic training episodes, a reinforcement learning model to perform one or more actions within the environment. 2. The computing system of claim 1 , wherein the declarative statement specifies one or more key elements, wherein the template further comprises one or more variables, wherein the at least one template action comprises a sequence of interactions between the one or more variables, wherein to generate, based on the template, the one or more synthetic training episodes, the computation engine is configured to: resolve the one or more variables with the one or more key elements of the declarative statement; and define the sequence of interactions with the resolved one or more variables to generate the one or more synthetic training episodes. 3. The computing system of claim 1 , wherein to generate, based on the template, the one or more synthetic training episodes, the computation engine is configured to: generate, based on the template, one or more preliminary synthetic training episodes; and apply, to the one or more preliminary synthetic training episodes, saliency masking to remove extraneous information from the one or more preliminary synthetic training episodes to generate the one or more synthetic training episodes. 4. The computing system of claim 3 , wherein the machine learning system is a reinforcement learning system. 5. The computing system of claim 1 , wherein the reinforcement learning model is a Deep Q-Network (DQN), wherein each of the one or more synthetic training episodes comprises a tuple, and wherein to train, with the one or more synthetic training episodes, the reinforcement learning model to perform the one or more actions within the environment, the reinforcement learning system is further configured to update one or more Q-value network parameters of the DQN with one or more tuples of the one or more synthetic training episodes. 6. The computing system of claim 1 , wherein the declarative statement defines at least one or one or more constraints on desirable behavior for the reinforcement learning model. 7. The computing system of claim 1 , wherein the input device is configured to receive the declarative statement from a human user. 8. A computing system comprising: an input device configured to receive a declarative statement; a machine learning system comprising a machine learning model; and a computation engine comprising processing circuitry, wherein the computation engine is configured to select, based on the declarative statement, a template that includes at least one template action that can be performed within an environment, wherein the computation engine is configured to generate, based on the template, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward, wherein the computation engine is further configured to generate one or more experiential training episodes, wherein each experiential training episode comprises at least one action and at least one reward and wherein each experiential training episode is collected by the machine learning model from past actions performed by the machine learning model; and wherein the machine learning system configured to train, with the one or more synthetic training episodes and the one or more experiential training episodes, a machine learning model to perform one or more actions within the environment. 9. The computing system of claim 1 , wherein the one or more actions comprise at least one of: 1) a task to navigate an autonomous vehicle through the environment; 2) a task to move an avatar within an artificial reality environment; or 3) a task to configure a computer or applications. 10. The computing system of claim 8 , wherein the computing system further comprises an experiential episode replay buffer configured to store each of the experiential training episodes as an experiential tuple, and wherein each experiential tuple defining the respective experiential training episode comprises a historical initial state of the environment, a historical action performed by the machine learning model, a historical resulting state of the environment, and a historical resulting reward for the machine learning model. 11. The computing system of claim 10 , wherein the computing system further comprises a synthetic episode replay buffer configured to store each of the synthetic training episodes as a synthetic tuple, and wherein each synthetic tuple defining the respective synthetic training episode comprises a synthetic initial state of the environment, a synthetic action performed by the machine learning model, a synthetic resulting state of the environment, and a synthetic resulting reward for the machine learning model. 12. The computing system of claim 8 , wherein to train the machine learning model to perform the one or more actions within the environment, the machine learning system is further configured to adapt between training the machine learning model with a synthetic training episode of the one or more synthetic training episodes and training the machine learning model with an experiential episode of the one or more experiential training episodes based on one or more parameters of the environment. 13. A computing system comprising: an input device configured to receive a declarative statement; a computation engine comprising processing circuitry, wherein the computation engine is configured to select, based on the declarative statement, a template that includes at least one template action that can be performed within an environment, wherein the computation engine is configured to generate, based on the template and a domain-specific action model for the environment, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward; and a machine learning system configured to train, with the one or more synthetic training episodes, a machine learning model to perform one or more actions within the environment. 14. A method for training a reinforcement learning model to perform one or more actions within an environment, the method comprising: receiving, by an input device, a declarative statement; selecting, by a computation engine comprising processing circuitry and based on the declarative statement, a template that includes at least one template action that can be performed within the environment; generating, by the computation engine and based on the template, one or more synthetic training episodes, each synthetic training episode comprising at least one action and at least one reward; and training, by a reinforcement learning system and with the one or more synthetic training episodes, the reinforcement learning model to perform the one or more actions within the environment. 15. The method of claim 14 , wherein the declarative statement specifies one or more key elements, wherein the temp
Related publications grouped by family.
Answers are generated from the same data shown on this page.