Generative neural network systems for generating instruction sequences to control an agent performing a task
US-2021271968-A1 · Sep 2, 2021 · US
US12005580B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12005580-B2 |
| Application number | US-202017436020-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 5, 2020 |
| Priority date | Mar 18, 2019 |
| Publication date | Jun 11, 2024 |
| Grant date | Jun 11, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for applying control to a robot, and apparatus therefor. A parametric model of an environment, in particular a deep neural network, is trained in accordance with a method for training the parametric model of the environment. The model is trained depending on a controlled system. A strategy is learned in accordance with a method for model-based learning of the strategy. Control is applied to the robot depending on the parametric model and on the strategy.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for training a parametric model of an environment, the model being a deep neural network, the method comprising the following steps: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; wherein the model encompasses a controlled system, and is trained depending on the controlled system, at least one state variable or manipulated variable for applying control to the controlled system is determined depending on the model and depending on at least one acquired actual variable or observed state variable of the controlled system. 2. The method as recited in claim 1 , wherein a discriminator determines the reward depending on the expert trajectory and on the model trajectory, and wherein at least one parameter of the discriminator is determined with a gradient descent method depending on the expert trajectory and on the model trajectory. 3. The method as recited in claim 1 , wherein the at least one parameter of the model is learned, depending on the reward, with an episode-based policy search or with a policy gradient method. 4. The method as recited in claim 3 , wherein the at least one parameter of the model is learned with REINFORCE or TRPO. 5. The method as recited in claim 1 , wherein the reward is determined depending on a true expected value for a system dynamic of the environment and depending on a modeled expected value for the model. 6. The method as recited in claim 1 , wherein the action specified depending on a strategy is acquired in the model state, the model is converted by the action, with a probability, into the new model state, the reward being determined depending on the model state, on the action, and on the new model state. 7. The method as recited in claim 1 , wherein the action is determined using an agent, depending on the model state of the model, in accordance with the strategy, the reward being determined depending on the strategy, on the action, or on a new model state, the strategy being learned, depending on the reward, in a reinforcement learning process. 8. A computer-implemented method for applying control to a robot, the method comprising: training a parametric model of an environment, the model being a deep neural network, the training including: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; learning the strategy for applying control to the robot; and applying control to the robot depending on the parametric model and on the strategy. 9. A non-transitory computer-readable memory on which is stored a computer program, including computer-readable instructions for training a parametric model of an environment, the model being a deep neural network, the instructions, when executed by a computer, causing the computer to perform the following steps: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; wherein the model encompasses a controlled system, and is trained depending on the controlled system, at least one state variable or manipulated variable for applying control to the controlled system is determined depending on the model and depending on at least one acquired actual variable or observed state variable of the controlled system. 10. An apparatus for applying control to a robot, the apparatus configured to: train a parametric model of an environment, the model being a deep neural network, the training including: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; learn the strategy for applying control to the robot; and apply control to the robot depending on the parametric model and on the strategy.
using neural networks only · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
learning, adaptive, model based, rule based expert control · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.