Mitigating reality gap through feature-level domain adaptation in training of vision-based robot action model

US12333787B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12333787-B2
Application numberUS-202217986428-A
CountryUS
Kind codeB2
Filing dateNov 14, 2022
Priority dateNov 16, 2021
Publication dateJun 17, 2025
Grant dateJun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations disclosed herein relate to mitigating the reality gap through feature-level domain adaptation in training of a vision-based robotic action machine learning (ML) model. Implementations mitigate the reality gap through utilization of embedding consistency losses and/or action consistency losses during training of the action ML model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more processors, the method comprising: generating a predicted real image based on processing a simulated image using a simulation-to-real generator model, wherein the simulated image is generated by a robotic simulator during performance of a robotic task by a simulated robot of the robotic simulator; in response to the predicted real image being generated based on processing the simulated image using the simulation-to-real generator model: pairing the simulated image with the predicted real image; processing the simulated image, using an action machine learning model being trained for use in controlling a robot to perform the robotic task, to generate one or more simulated image predicted action outputs, wherein processing the simulated image comprises: generating a simulated image embedding by processing the simulated image using vision feature layers of the action machine learning model; and processing the simulated image embedding using additional layers of the action machine learning model to generate the simulated image predicted action outputs; processing the predicted real image, using the action machine learning model, to generate one or more predicted real image predicted action outputs, wherein processing the predicted real image comprises: generating a predicted real image embedding by processing the predicted real image using the vision feature layers; and processing the predicted real image embedding using the additional layers to generate the real image predicted action outputs; in response to the pairing of the simulated image with the predicted real image: generating an embedding consistency loss as a function of comparison of the simulated image embedding and the predicted real image embedding; and updating the vision feature layers based on the generated embedding consistency loss. 2. The method of claim 1 , wherein updating the vision feature layers based on the generated embedding consistency loss comprises: backpropagating the loss across the vision feature layers without backpropagating the loss across the additional layers. 3. The method of claim 1 , further comprising, in response to the pairing of the simulated image with the predicted real image: generating one or more action consistency losses as a function of one or more action output comparisons, each of the action output comparisons being between a corresponding one of the simulated image predicted action outputs and a corresponding one of the predicted real image predicted action outputs; and updating the vision feature layers further based on the one or more action consistency losses. 4. The method of claim 3 , wherein the additional layers comprise a first control head and a second control head, wherein the simulated image predicted action outputs comprise a first simulated image predicted action output generated using the first control head and a second simulated image predicted action output generated using the second control head, and wherein the predicted real image predicted action outputs comprise a first predicted real image predicted action output generated using the first control head and a second predicted real image predicted action output generated using the second control head. 5. The method of claim 4 , wherein generating the action consistency losses comprises: generating a first action consistency loss based on comparison of the first simulated image predicted action output and the first predicted real image predicted action output; generating a second action consistency loss based on comparison of the second simulated image predicted action output and the second predicted real image predicted action output; and generating the action consistency loss as a function of the first action consistency loss and the second action consistency loss. 6. The method of claim 5 , further comprising, in response to the pairing of the simulated image with the predicted real image: backpropagating the first action consistency loss across the first control head; and backpropagating the second action consistency loss across the second control head; wherein updating the vision feature layers further based on the one or more action consistency losses comprises: backpropagating residuals, of the first action consistency loss and the second action consistency loss, across the vision feature layers. 7. The method of claim 5 , wherein the first simulated image predicted action output and the first predicted real image predicted action output each define a corresponding first set of values for controlling a first robotic component; and wherein the second simulated image predicted action output and the second predicted real image predicted action output each define a corresponding second set of values for controlling a second robotic component. 8. The method of claim 7 , wherein the first robotic component is one of a robot arm, a robot end effector, a robot base, or a robot head; and wherein the second robotic component is another one of the robot arm, the robot end effector, the robot base, or the robot head. 9. The method of claim 1 , further comprising: distorting the simulated image, using one or more distortion techniques, to generate a distorted simulated image; pairing the distorted simulated image with the predicted real image; processing the distorted simulated image, using the action machine learning model, to generate one or more distorted simulated image predicted action outputs, wherein processing the distorted simulated image comprises: generating a distorted simulated image embedding by processing the distorted simulated image using the vision feature layers; and processing the distorted simulated image embedding using the additional layers to generate the distorted simulated image predicted action outputs; in response to the pairing of the distorted simulated image with the predicted real image: generating an additional embedding consistency loss as a function of comparison of the distorted simulated image embedding and the predicted real embedding; and updating the vision feature layers based on the generated additional embedding consistency loss. 10. The method of claim 1 , further comprising: distorting the simulated image, using one or more distortion techniques, to generate a distorted simulated image; pairing the distorted simulated image with the simulated image; processing the distorted simulated image, using the action machine learning model, to generate one or more distorted simulated image predicted action outputs, wherein processing the distorted simulated image comprises: generating a distorted simulated image embedding by processing the distorted simulated image using the vision feature layers; and processing the distorted simulated image embedding using the additional layers to generate the distorted simulated image predicted action outputs; in response to the pairing of the distorted simulated image with the simulated image: generating an additional embedding consistency loss as a function of comparison of the distorted simulated image embedding and the simulated image embedding; and updating the vision feature layers based on the generated additional embedding consistency loss. 11. The method of claim 1 , wherein generating the predicted real image comprises: processing the simulated image using the simulation-to-real generator model to generate, as direct output from the simulation-to-real generator model, an original predicted real image; and distorting the original predicted real image, using one or more distortion techniques, to generate the predicted

Assignees

Inventors

Classifications

  • using two or more images, e.g. averaging or subtraction · CPC title

  • Training; Learning · CPC title

  • using machine learning, e.g. neural networks · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • G06V10/774Primary

    Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333787B2 cover?
Implementations disclosed herein relate to mitigating the reality gap through feature-level domain adaptation in training of a vision-based robotic action machine learning (ML) model. Implementations mitigate the reality gap through utilization of embedding consistency losses and/or action consistency losses during training of the action ML model.
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).