Deep reinforcement learning for robotic manipulation
US-2021237266-A1 · Aug 5, 2021 · US
US12498677B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12498677-B2 |
| Application number | US-202017767675-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 15, 2020 |
| Priority date | Nov 15, 2019 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementations disclosed herein relate to mitigating the reality gap through training a simulation-to-real machine learning model (“Sim2Real” model) using a vision-based robot task machine learning model. The vision-based robot task machine learning model can be, for example, a reinforcement learning (“RL”) neural network model (RL-network), such as an RL-network that represents a Q-function.
Opening claim text (preview).
What is claimed is: 1 . A method implemented by one or more processors, the method comprising: processing a simulated image, using a simulation-to-real generator model, to generate a simulated episode predicted real image, wherein the simulated image is generated by a robotic simulator during a simulated episode of a simulated robot attempting performance of a robotic task; processing the simulated episode predicted real image, using a real-to-simulation generator model, to generate a simulated episode predicted simulation image; processing the simulated image along with a simulated robot action, using a task machine learning model being trained for use in the robotic task, to generate a first predicted value; processing the simulated episode predicted real image along with the simulated robot action, using the task machine learning model, to generate a second predicted value; processing the simulated episode predicted simulated image along with the simulated robot action, using the task machine learning model, to generate a third predicted value; generating a loss as a function of comparisons of the first predicted value, the second predicted value, and the third predicted value; and updating the simulation-to-real generator model based on the generated loss. 2 . The method of claim 1 , further comprising: processing a real image, using the real-to-simulation generator model, to generate a real episode predicted simulation image, wherein the real image is captured by a real camera, associated with a real robot, during a real episode of the real robot attempting performance of the robotic task; processing the real episode predicted simulation image, using the simulation-to-real generator model, to generate a real episode predicted real image; processing the real image along with a real robot action, using the task machine learning model or an additional task machine learning model being trained for use in the robotic task, to generate a fourth predicted value; processing the real episode predicted simulated image along with the real robot action, using the task machine learning model or the additional task machine learning model, to generate a fifth predicted value; and processing the real episode predicted real image along with the real robot action, using the task machine learning model or the additional task machine learning model, to generate a sixth predicted value, wherein generating the loss is further a function of additional comparisons of the fourth predicted value, the fifth predicted value, and the sixth predicted value. 3 . The method of claim 2 , wherein the comparisons of the first predicted value, the second predicted value, and the third predicted value comprise three comparisons, each of the three comparisons being between a unique pair of the first predicted value, the second predicted value, and the third predicted value. 4 . The method of claim 3 , wherein the additional comparisons of the fourth predicted value, the fifth predicted value, and the sixth predicted value comprise three additional comparisons, each of the three additional comparisons being between a unique pair of the fourth predicted value, the fifth predicted value, and the sixth predicted value. 5 . The method of claim 2 , wherein generating the loss is further a function of an adversarial loss and/or a cycle consistency loss, wherein the adversarial loss and the cycle consistency loss are both generated independent of any outputs generated using the task machine learning model or the additional task machine learning model. 6 . The method of claim 5 , wherein the adversarial loss is generated based on whether a simulation-to-real discriminator model predicts the predicted real image is an actual real image or the predicted real image generated by the simulation-to-real generator, and wherein the cycle consistency loss is generated based on comparison of the simulated image and the simulated episode predicted simulation image. 7 . The method of claim 6 , wherein generating the loss is further a function of both the adversarial loss and the cycle consistency loss. 8 . The method of claim 1 , wherein the task machine learning model represents a Q-function, wherein the task machine learning model is being trained during reinforcement learning based on the simulated episode and additional simulated episodes, and wherein the first predicted value is a first Q-value, the second predicted value is a second Q-value, and the third predicted value is a third Q-value. 9 . The method of claim 1 , further comprising: generating a task machine learning model loss based on the second predicted value; and updating the task machine learning model based on the task machine learning model loss. 10 . The method of claim 9 , wherein generating the task machine learning model loss is independent of at least the first predicted value and the third predicted value. 11 . The method of claim 1 , wherein the simulated episode is an offline episode. 12 . The method of claim 1 , wherein the simulated episode is an online episode. 13 . The method of claim 1 , wherein the robotic task is an object manipulation task or a navigation task. 14 . The method of claim 13 , wherein the robotic task is the object manipulation task, and wherein the object manipulation task is a grasping task. 15 . A system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the at least one processor to be operable to: process a simulated image, using a simulation-to-real generator model, to generate a simulated episode predicted real image, wherein the simulated image is generated by a robotic simulator during a simulated episode of a simulated robot attempting performance of a robotic task; process the simulated episode predicted real image, using a real-to-simulation generator model, to generate a simulated episode predicted simulation image; process the simulated image along with a simulated robot action, using a task machine learning model being trained for use in the robotic task, to generate a first predicted value; process the simulated episode predicted real image along with the simulated robot action, using the task machine learning model, to generate a second predicted value; process the simulated episode predicted simulated image along with the simulated robot action, using the task machine learning model, to generate a third predicted value; generate a loss as a function of comparisons of the first predicted value, the second predicted value, and the third predicted value; and update the simulation-to-real generator model based on the generated loss. 16 . The system of claim 15 , wherein the at least one processor is further operable to: process a real image, using the real-to-simulation generator model, to generate a real episode predicted simulation image, wherein the real image is captured by a real camera, associated with a real robot, during a real episode of the real robot attempting performance of the robotic task; process the real episode predicted simulation image, using the simulation-to-real generator model, to generate a real episode predicted real image; process the real image along with a real robot action, using the task machine learning model or an additional task machine learning model being trained for use in the robotic task, to generate a fourth predicted value; process the real episode predicted simulated image along with the real robot action, using the task machine learning model or t
including video camera means · CPC title
Vision controlled systems · CPC title
learning, adaptive, model based, rule based expert control · CPC title
Simulation of manipulator lay-out, design, modelling of manipulator · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.