System(s) and method(s) of using imitation learning in training and refining robotic control policies

US12226920B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12226920-B2
Application numberUS-202318233251-A
CountryUS
Kind codeB2
Filing dateAug 11, 2023
Priority dateMar 16, 2021
Publication dateFeb 18, 2025
Grant dateFeb 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations described herein relate to training and refining robotic control policies using imitation learning techniques. A robotic control policy can be initially trained based on human demonstrations of various robotic tasks. Further, the robotic control policy can be refined based on human interventions while a robot is performing a robotic task. In some implementations, the robotic control policy may determine whether the robot will fail in performance of the robotic task, and prompt a human to intervene in performance of the robotic task. In additional or alternative implementations, a representation of the sequence of actions can be visually rendered for presentation to the human can proactively intervene in performance of the robotic task.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented using one or more processors, the method comprising: receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of the robot, the vision data being captured during performance of a robotic task by the robot; processing, using a robotic control policy, the instance of the vision data to generate a sequence of actions to be performed by the robot during the robotic task, wherein the sequence of actions includes an initial action to be performed by the robot in furtherance of the robotic task and a plurality of predicted actions that follow the initial action, and wherein processing the instance of the vision data to generate the sequence of actions to be performed by the robot during the robotic task and using the robotic control policy comprises: processing, using an intermediate portion of the robotic control policy, the instance of the vision data to generate an intermediate representation of the instance of the vision data; processing, using a first control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding first set of values for a first portion of control of a component of the robot; processing, using a second control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding second set of values for a second portion of control of the component of the robot; and processing, using a third control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding third set of values for a third portion of control of the component of the robot; causing, based on the sequence of actions to be performed, the robot to initiate performance of the robotic task; during performance of the robotic task: causing a representation of the sequence of actions to be visually rendered via a graphical user interface of a computing device; and receiving, from a user of the computing device, and based on the representation of the sequence of actions, user input that intervenes with ongoing performance of the robotic task, the user input being received via the computing device or an additional computing device; and causing the robotic control policy to be updated based on the user input. 2. The method of claim 1 , wherein the representation of the sequence of actions visually rendered via the graphical user interface of the computing device comprises a sequence of corresponding waypoints overlaying the environment of the robot captured in the instance of the vision data, each of the corresponding waypoints being associated with one or more components of the robot in response to a given action, included in the sequence of actions, being performed by the robot. 3. The method of claim 1 , wherein the representation of the sequence of actions visually rendered via the graphical user interface of the computing device comprises a sequence of corresponding states of the robot overlaying the environment of the robot captured in the instance of the vision data, each of the corresponding states of the robot corresponding to a given state of the robot in response to a given action, included in the sequence of actions, being performed by the robot. 4. The method of claim 1 , wherein the representation of the sequence of actions visually rendered via the graphical user interface of the computing device comprises a corresponding representation of the initial action and each of the plurality of predicted actions that follow the initial action. 5. The method of claim 4 , wherein the corresponding representation of each action included in the sequence of actions is selectable and, when selected, causes the one or more of the corresponding first set of values for the first portion of control of the component of the robot, the corresponding second set of values for the second portion of control of the component of the robot, and/or the corresponding third set of values for the third portion of control of the component of the robot to be visually rendered via the graphical user interface of the computing device. 6. The method of claim 1 , further comprising: in response to receiving the user input that intervenes with ongoing performance of the robotic task: generating, based on the user input, and for one or more actions included in the sequence of actions, a corresponding alternative first set of values, for the first portion of control of the component of the robot, a corresponding alternative second set of values, for the second portion of control of the component of the robot, and a corresponding alternative third set of values, for the third portion of control of the component of the robot, that the robot should utilize in performance of the robotic task; generating, based on comparing the corresponding first set of values to the corresponding alternative first set of values, a first loss; generating, based on comparing the corresponding second set of values to the corresponding alternative second set of values, a second loss; generating, based on comparing the corresponding third set of values to the corresponding alternative third set of values, a third loss and wherein causing the robotic control policy to be updated is based on the first loss, the second loss, and the third loss. 7. The method of claim 1 , further comprising: receiving, from the user of the computing device, and subsequent to performance of the robotic task, additional user input associated with data generated during performance of the robotic task; and wherein causing the robotic control policy to be updated is further based on the additional user input. 8. The method of claim 7 , wherein the additional user input relabels data generated during performance of the robotic task, and wherein the data generated during performance of the robotic task is generated using the robotic control policy or generated based on the user input. 9. A method implemented using one or more processors, the method comprising: receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of the robot, the instance of the vision data being captured during performance of a robotic task by the robot; processing, using an intermediate portion of a robotic control policy, the instance of the vision data to generate an intermediate representation of the instance of the vision data; processing, using a first control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for an action to be performed by the robot in furtherance of the robotic task, a corresponding first set of values for a first portion of control of a component of the robot; processing, using a second control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the action, a corresponding second set of values for a second portion of control of the component of the robot; processing, using a third control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the action, a corresponding third set of values for a third portion of control of the component of the robot; and causing the ro

Assignees

Inventors

Classifications

  • Control stands, e.g. consoles, switchboards · CPC title

  • B25J9/1661Primary

    characterised by task planning, object-oriented languages · CPC title

  • learning, adaptive, model based, rule based expert control · CPC title

  • Hardware, e.g. neural networks, fuzzy logic, interfaces, processor · CPC title

  • Artificial intelligence AI, expert, knowledge, rule based system KBS · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12226920B2 cover?
Implementations described herein relate to training and refining robotic control policies using imitation learning techniques. A robotic control policy can be initially trained based on human demonstrations of various robotic tasks. Further, the robotic control policy can be refined based on human interventions while a robot is performing a robotic task. In some implementations, the robotic con…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification B25J9/1661. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Feb 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).