System(s) and method(s) of using imitation learning in training and refining robotic control policies
US-11772272-B2 · Oct 3, 2023 · US
US12226920B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12226920-B2 |
| Application number | US-202318233251-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 11, 2023 |
| Priority date | Mar 16, 2021 |
| Publication date | Feb 18, 2025 |
| Grant date | Feb 18, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementations described herein relate to training and refining robotic control policies using imitation learning techniques. A robotic control policy can be initially trained based on human demonstrations of various robotic tasks. Further, the robotic control policy can be refined based on human interventions while a robot is performing a robotic task. In some implementations, the robotic control policy may determine whether the robot will fail in performance of the robotic task, and prompt a human to intervene in performance of the robotic task. In additional or alternative implementations, a representation of the sequence of actions can be visually rendered for presentation to the human can proactively intervene in performance of the robotic task.
Opening claim text (preview).
What is claimed is: 1. A method implemented using one or more processors, the method comprising: receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of the robot, the vision data being captured during performance of a robotic task by the robot; processing, using a robotic control policy, the instance of the vision data to generate a sequence of actions to be performed by the robot during the robotic task, wherein the sequence of actions includes an initial action to be performed by the robot in furtherance of the robotic task and a plurality of predicted actions that follow the initial action, and wherein processing the instance of the vision data to generate the sequence of actions to be performed by the robot during the robotic task and using the robotic control policy comprises: processing, using an intermediate portion of the robotic control policy, the instance of the vision data to generate an intermediate representation of the instance of the vision data; processing, using a first control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding first set of values for a first portion of control of a component of the robot; processing, using a second control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding second set of values for a second portion of control of the component of the robot; and processing, using a third control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the initial action and for each of the plurality of predicted actions that follow the initial action, a corresponding third set of values for a third portion of control of the component of the robot; causing, based on the sequence of actions to be performed, the robot to initiate performance of the robotic task; during performance of the robotic task: causing a representation of the sequence of actions to be visually rendered via a graphical user interface of a computing device; and receiving, from a user of the computing device, and based on the representation of the sequence of actions, user input that intervenes with ongoing performance of the robotic task, the user input being received via the computing device or an additional computing device; and causing the robotic control policy to be updated based on the user input. 2. The method of claim 1 , wherein the representation of the sequence of actions visually rendered via the graphical user interface of the computing device comprises a sequence of corresponding waypoints overlaying the environment of the robot captured in the instance of the vision data, each of the corresponding waypoints being associated with one or more components of the robot in response to a given action, included in the sequence of actions, being performed by the robot. 3. The method of claim 1 , wherein the representation of the sequence of actions visually rendered via the graphical user interface of the computing device comprises a sequence of corresponding states of the robot overlaying the environment of the robot captured in the instance of the vision data, each of the corresponding states of the robot corresponding to a given state of the robot in response to a given action, included in the sequence of actions, being performed by the robot. 4. The method of claim 1 , wherein the representation of the sequence of actions visually rendered via the graphical user interface of the computing device comprises a corresponding representation of the initial action and each of the plurality of predicted actions that follow the initial action. 5. The method of claim 4 , wherein the corresponding representation of each action included in the sequence of actions is selectable and, when selected, causes the one or more of the corresponding first set of values for the first portion of control of the component of the robot, the corresponding second set of values for the second portion of control of the component of the robot, and/or the corresponding third set of values for the third portion of control of the component of the robot to be visually rendered via the graphical user interface of the computing device. 6. The method of claim 1 , further comprising: in response to receiving the user input that intervenes with ongoing performance of the robotic task: generating, based on the user input, and for one or more actions included in the sequence of actions, a corresponding alternative first set of values, for the first portion of control of the component of the robot, a corresponding alternative second set of values, for the second portion of control of the component of the robot, and a corresponding alternative third set of values, for the third portion of control of the component of the robot, that the robot should utilize in performance of the robotic task; generating, based on comparing the corresponding first set of values to the corresponding alternative first set of values, a first loss; generating, based on comparing the corresponding second set of values to the corresponding alternative second set of values, a second loss; generating, based on comparing the corresponding third set of values to the corresponding alternative third set of values, a third loss and wherein causing the robotic control policy to be updated is based on the first loss, the second loss, and the third loss. 7. The method of claim 1 , further comprising: receiving, from the user of the computing device, and subsequent to performance of the robotic task, additional user input associated with data generated during performance of the robotic task; and wherein causing the robotic control policy to be updated is further based on the additional user input. 8. The method of claim 7 , wherein the additional user input relabels data generated during performance of the robotic task, and wherein the data generated during performance of the robotic task is generated using the robotic control policy or generated based on the user input. 9. A method implemented using one or more processors, the method comprising: receiving, from one or more vision components of a robot, an instance of vision data capturing an environment of the robot, the instance of the vision data being captured during performance of a robotic task by the robot; processing, using an intermediate portion of a robotic control policy, the instance of the vision data to generate an intermediate representation of the instance of the vision data; processing, using a first control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for an action to be performed by the robot in furtherance of the robotic task, a corresponding first set of values for a first portion of control of a component of the robot; processing, using a second control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the action, a corresponding second set of values for a second portion of control of the component of the robot; processing, using a third control head of the robotic control policy, the intermediate representation of the instance of the vision data to generate, for the action, a corresponding third set of values for a third portion of control of the component of the robot; and causing the ro
Control stands, e.g. consoles, switchboards · CPC title
characterised by task planning, object-oriented languages · CPC title
learning, adaptive, model based, rule based expert control · CPC title
Hardware, e.g. neural networks, fuzzy logic, interfaces, processor · CPC title
Artificial intelligence AI, expert, knowledge, rule based system KBS · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.