System(s) and method(s) of using imitation learning in training and refining robotic control policies

US11772272B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11772272-B2
Application numberUS-202117203296-A
CountryUS
Kind codeB2
Filing dateMar 16, 2021
Priority dateMar 16, 2021
Publication dateOct 3, 2023
Grant dateOct 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations described herein relate to training and refining robotic control policies using imitation learning techniques. A robotic control policy can be initially trained based on human demonstrations of various robotic tasks. Further, the robotic control policy can be refined based on human interventions while a robot is performing a robotic task. In some implementations, the robotic control policy may determine whether the robot will fail in performance of the robotic task, and prompt a human to intervene in performance of the robotic task. In additional or alternative implementations, a representation of the sequence of actions can be visually rendered for presentation to the human can proactively intervene in performance of the robotic task.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented using one or more processors, the method comprising: receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of the robot, the instance of the vision data being captured during performance of a robotic task by the robot; processing, using a robotic control policy, the instance of the vision data to generate a sequence of actions to be performed by the robot during the robotic task, the sequence of actions including an initial action to be performed by the robot in furtherance of the robotic task and a plurality of predicted actions that are predicted to follow the initial action; determining, based on processing the instance of the vision data using the robotic control policy, whether the robot will fail in performance of the robotic task; in response to determining that the robot will fail in performance of the robotic task: causing a prompt to be rendered via an interface of a computing device or the robot, the prompt requesting a user of the computing device intervene in performance of the robotic task; receiving, from a user of the computing device, and based on the prompt, user input that intervenes with performance of the robotic task, the user input being received via an input device of the computing device or an additional computing device; and causing the robotic control policy to be updated based on the user input; in response to determining that the robot will not fail in performance of the robotic action, causing the robot to perform the initial action; and until the robot completes performance of the robotic task: receiving, from one or more of the vision components of the robot, an additional instance of vision data capturing the environment of the robot, the additional instance of the vision data being captured during performance of the robotic task by the robot; processing, using the robotic control policy, the additional instance of the vision data to generate an additional sequence of actions to be performed by the robot during the robotic task, the additional sequence of actions including a next action to be performed by the robot in furtherance of the robotic task and an additional plurality of predicted actions that are predicted to follow the next action; and determining, based on processing the additional instance of the vision data using the robotic control policy, whether the robot will fail in performance of the robotic task. 2. The method of claim 1 , wherein each action included in the sequence of actions comprises a corresponding first set of values for a first component of the robot, and wherein each action included in the sequence of actions also comprises a corresponding second set of values for a second component of the robot. 3. The method of claim 2 , wherein causing the robot to perform the initial action comprises: causing the robot to utilize the corresponding first set of values to actuate the first component of the robot; and causing the robot to utilize the corresponding second set of values to actuate the second component of the robot. 4. The method of claim 3 , wherein the first component of the robot is one of: a robot arm, a robot end effector, a robot base, or a robot head. 5. The method of claim 4 , wherein the second component of the robot is another one of: a robot arm, a robot end effector, a robot base, or a robot head. 6. The method of claim 1 , wherein causing the robotic control policy to be updated based on the user input is subsequent to determining that the robot has completed performance of the robotic task. 7. The method of claim 1 , wherein processing the instance of the vision data to generate the sequence of actions using the robotic control policy comprises: processing, using an intermediate portion of the robotic control policy, the instance of the vision data to generate an intermediate representation of the instance of the vision data; processing, using a first control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for each action included the sequence of actions, a corresponding first set of values for a first component of the robot; and processing, using a second control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for each action included the sequence of actions, a corresponding second set of values for a second component of the robot. 8. The method of claim 7 , further comprising: in response to receiving the user input that intervenes with performance of the robotic task: generating, based on the user input, and for one or more actions included in the sequence of actions, a corresponding alternative first set of values, for the first component of the robot, and a corresponding alternative second set of values, for the second component of the robot, that the robot should utilize in performance of the robotic task; generating, based on comparing the corresponding first set of values to the corresponding alternative first set of values, a first loss; generating, based on comparing the corresponding second set of values to the corresponding alternative second set of values, a second loss; and wherein causing the robotic control policy to be updated is based on the first loss and the second loss. 9. The method of claim 8 , wherein the first loss is generated using a first loss function, and wherein the second loss is generated using a distinct second loss function. 10. The method of claim 1 , wherein processing the instance of the vision data to generate the sequence of actions to be performed by the robot during the robotic task comprises: processing, using an intermediate portion of the robotic control policy, the instance of the vision data to generate an intermediate representation of the instance of the vision data, wherein the sequence of actions is generated based on the intermediate representation of the instance of the vision data. 11. The method of claim 10 , wherein determining whether the robot will fail in performance of the robotic task comprises: processing, using a control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for one or more actions included in the sequence of actions, one or more corresponding sets of values associated with performance of the robotic task; and determining that the robot will fail in performance of the robotic task based on the corresponding set of values. 12. The method of claim 11 , wherein determining that the robot will fail in performance of the robotic task is based on one or more of the corresponding set of values associated with the initial action. 13. The method of claim 11 , wherein determining that the robot will fail in performance of the robotic task is based on one or more of the corresponding set of values associated with one or more of the plurality of predicted actions that follow the initial action. 14. The method of claim 11 , wherein the corresponding set of values associated with performance of the robotic task includes a corresponding value associated with one or more of: whether the robot will fail in performance of the robotic task, whether the robot will continue in performance of the robotic task, or whether the robot has completed performance of the robotic task. 15. A method implemented using one or more processors, the method comprising: receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of

Assignees

Inventors

Classifications

  • Artificial intelligence AI, expert, knowledge, rule based system KBS · CPC title

  • Tasks are classified in types of unit motions · CPC title

  • Hierarchical, learning, recognition and skill level and adaptation servo level · CPC title

  • Generic motion control operations, primitive skills each for special task · CPC title

  • Teleassistance, operator assists, controls autonomous robot · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11772272B2 cover?
Implementations described herein relate to training and refining robotic control policies using imitation learning techniques. A robotic control policy can be initially trained based on human demonstrations of various robotic tasks. Further, the robotic control policy can be refined based on human interventions while a robot is performing a robotic task. In some implementations, the robotic con…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification B25J9/1656. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).