Method and device for controlling a robot

US12005580B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12005580-B2
Application numberUS-202017436020-A
CountryUS
Kind codeB2
Filing dateMar 5, 2020
Priority dateMar 18, 2019
Publication dateJun 11, 2024
Grant dateJun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for applying control to a robot, and apparatus therefor. A parametric model of an environment, in particular a deep neural network, is trained in accordance with a method for training the parametric model of the environment. The model is trained depending on a controlled system. A strategy is learned in accordance with a method for model-based learning of the strategy. Control is applied to the robot depending on the parametric model and on the strategy.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for training a parametric model of an environment, the model being a deep neural network, the method comprising the following steps: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; wherein the model encompasses a controlled system, and is trained depending on the controlled system, at least one state variable or manipulated variable for applying control to the controlled system is determined depending on the model and depending on at least one acquired actual variable or observed state variable of the controlled system. 2. The method as recited in claim 1 , wherein a discriminator determines the reward depending on the expert trajectory and on the model trajectory, and wherein at least one parameter of the discriminator is determined with a gradient descent method depending on the expert trajectory and on the model trajectory. 3. The method as recited in claim 1 , wherein the at least one parameter of the model is learned, depending on the reward, with an episode-based policy search or with a policy gradient method. 4. The method as recited in claim 3 , wherein the at least one parameter of the model is learned with REINFORCE or TRPO. 5. The method as recited in claim 1 , wherein the reward is determined depending on a true expected value for a system dynamic of the environment and depending on a modeled expected value for the model. 6. The method as recited in claim 1 , wherein the action specified depending on a strategy is acquired in the model state, the model is converted by the action, with a probability, into the new model state, the reward being determined depending on the model state, on the action, and on the new model state. 7. The method as recited in claim 1 , wherein the action is determined using an agent, depending on the model state of the model, in accordance with the strategy, the reward being determined depending on the strategy, on the action, or on a new model state, the strategy being learned, depending on the reward, in a reinforcement learning process. 8. A computer-implemented method for applying control to a robot, the method comprising: training a parametric model of an environment, the model being a deep neural network, the training including: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; learning the strategy for applying control to the robot; and applying control to the robot depending on the parametric model and on the strategy. 9. A non-transitory computer-readable memory on which is stored a computer program, including computer-readable instructions for training a parametric model of an environment, the model being a deep neural network, the instructions, when executed by a computer, causing the computer to perform the following steps: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; wherein the model encompasses a controlled system, and is trained depending on the controlled system, at least one state variable or manipulated variable for applying control to the controlled system is determined depending on the model and depending on at least one acquired actual variable or observed state variable of the controlled system. 10. An apparatus for applying control to a robot, the apparatus configured to: train a parametric model of an environment, the model being a deep neural network, the training including: providing the model, the model being configured to determine a new model state depending on a model state, on an action, and on at least one parameter of the model; determining an expert trajectory depending on a demonstration, wherein an expert action, which specifies an expert in an environment state in accordance with an expert strategy, is acquired, the environment is converted by the expert action, with a probability, into a new environment state, and the environment state, the expert action, and the new environment state are determined as a data point of the expert trajectory; wherein the parameter of the model is determined depending on a reward, the reward being determined depending on the expert trajectory and on a model trajectory determined in accordance with a strategy depending on the model state; learn the strategy for applying control to the robot; and apply control to the robot depending on the parametric model and on the strategy.

Assignees

Inventors

Classifications

  • using neural networks only · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Non-supervised learning, e.g. competitive learning · CPC title

  • B25J9/163Primary

    learning, adaptive, model based, rule based expert control · CPC title

  • G06N3/006Primary

    based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12005580B2 cover?
A computer-implemented method for applying control to a robot, and apparatus therefor. A parametric model of an environment, in particular a deep neural network, is trained in accordance with a method for training the parametric model of the environment. The model is trained depending on a controlled system. A strategy is learned in accordance with a method for model-based learning of the strat…
Who is the assignee on this patent?
Bosch Gmbh Robert
What technology area does this patent fall under?
Primary CPC classification B25J9/163. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).