What technology area does this patent fall under?

Primary CPC classification G06V10/774. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Mitigating reality gap through feature-level domain adaptation in training of vision-based robot action model

US12333787B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12333787-B2
Application number	US-202217986428-A
Country	US
Kind code	B2
Filing date	Nov 14, 2022
Priority date	Nov 16, 2021
Publication date	Jun 17, 2025
Grant date	Jun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations disclosed herein relate to mitigating the reality gap through feature-level domain adaptation in training of a vision-based robotic action machine learning (ML) model. Implementations mitigate the reality gap through utilization of embedding consistency losses and/or action consistency losses during training of the action ML model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more processors, the method comprising: generating a predicted real image based on processing a simulated image using a simulation-to-real generator model, wherein the simulated image is generated by a robotic simulator during performance of a robotic task by a simulated robot of the robotic simulator; in response to the predicted real image being generated based on processing the simulated image using the simulation-to-real generator model: pairing the simulated image with the predicted real image; processing the simulated image, using an action machine learning model being trained for use in controlling a robot to perform the robotic task, to generate one or more simulated image predicted action outputs, wherein processing the simulated image comprises: generating a simulated image embedding by processing the simulated image using vision feature layers of the action machine learning model; and processing the simulated image embedding using additional layers of the action machine learning model to generate the simulated image predicted action outputs; processing the predicted real image, using the action machine learning model, to generate one or more predicted real image predicted action outputs, wherein processing the predicted real image comprises: generating a predicted real image embedding by processing the predicted real image using the vision feature layers; and processing the predicted real image embedding using the additional layers to generate the real image predicted action outputs; in response to the pairing of the simulated image with the predicted real image: generating an embedding consistency loss as a function of comparison of the simulated image embedding and the predicted real image embedding; and updating the vision feature layers based on the generated embedding consistency loss. 2. The method of claim 1 , wherein updating the vision feature layers based on the generated embedding consistency loss comprises: backpropagating the loss across the vision feature layers without backpropagating the loss across the additional layers. 3. The method of claim 1 , further comprising, in response to the pairing of the simulated image with the predicted real image: generating one or more action consistency losses as a function of one or more action output comparisons, each of the action output comparisons being between a corresponding one of the simulated image predicted action outputs and a corresponding one of the predicted real image predicted action outputs; and updating the vision feature layers further based on the one or more action consistency losses. 4. The method of claim 3 , wherein the additional layers comprise a first control head and a second control head, wherein the simulated image predicted action outputs comprise a first simulated image predicted action output generated using the first control head and a second simulated image predicted action output generated using the second control head, and wherein the predicted real image predicted action outputs comprise a first predicted real image predicted action output generated using the first control head and a second predicted real image predicted action output generated using the second control head. 5. The method of claim 4 , wherein generating the action consistency losses comprises: generating a first action consistency loss based on comparison of the first simulated image predicted action output and the first predicted real image predicted action output; generating a second action consistency loss based on comparison of the second simulated image predicted action output and the second predicted real image predicted action output; and generating the action consistency loss as a function of the first action consistency loss and the second action consistency loss. 6. The method of claim 5 , further comprising, in response to the pairing of the simulated image with the predicted real image: backpropagating the first action consistency loss across the first control head; and backpropagating the second action consistency loss across the second control head; wherein updating the vision feature layers further based on the one or more action consistency losses comprises: backpropagating residuals, of the first action consistency loss and the second action consistency loss, across the vision feature layers. 7. The method of claim 5 , wherein the first simulated image predicted action output and the first predicted real image predicted action output each define a corresponding first set of values for controlling a first robotic component; and wherein the second simulated image predicted action output and the second predicted real image predicted action output each define a corresponding second set of values for controlling a second robotic component. 8. The method of claim 7 , wherein the first robotic component is one of a robot arm, a robot end effector, a robot base, or a robot head; and wherein the second robotic component is another one of the robot arm, the robot end effector, the robot base, or the robot head. 9. The method of claim 1 , further comprising: distorting the simulated image, using one or more distortion techniques, to generate a distorted simulated image; pairing the distorted simulated image with the predicted real image; processing the distorted simulated image, using the action machine learning model, to generate one or more distorted simulated image predicted action outputs, wherein processing the distorted simulated image comprises: generating a distorted simulated image embedding by processing the distorted simulated image using the vision feature layers; and processing the distorted simulated image embedding using the additional layers to generate the distorted simulated image predicted action outputs; in response to the pairing of the distorted simulated image with the predicted real image: generating an additional embedding consistency loss as a function of comparison of the distorted simulated image embedding and the predicted real embedding; and updating the vision feature layers based on the generated additional embedding consistency loss. 10. The method of claim 1 , further comprising: distorting the simulated image, using one or more distortion techniques, to generate a distorted simulated image; pairing the distorted simulated image with the simulated image; processing the distorted simulated image, using the action machine learning model, to generate one or more distorted simulated image predicted action outputs, wherein processing the distorted simulated image comprises: generating a distorted simulated image embedding by processing the distorted simulated image using the vision feature layers; and processing the distorted simulated image embedding using the additional layers to generate the distorted simulated image predicted action outputs; in response to the pairing of the distorted simulated image with the simulated image: generating an additional embedding consistency loss as a function of comparison of the distorted simulated image embedding and the simulated image embedding; and updating the vision feature layers based on the generated additional embedding consistency loss. 11. The method of claim 1 , wherein generating the predicted real image comprises: processing the simulated image using the simulation-to-real generator model to generate, as direct output from the simulation-to-real generator model, an original predicted real image; and distorting the original predicted real image, using one or more distortion techniques, to generate the predicted

Assignees

Google Llc

Inventors

Classifications

G06T5/50
using two or more images, e.g. averaging or subtraction · CPC title
G06T2207/20081
Training; Learning · CPC title
G06T5/60
using machine learning, e.g. neural networks · CPC title
G06V20/56
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
G06V10/774Primary
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

View patent family 86323896

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333787B2 cover?: Implementations disclosed herein relate to mitigating the reality gap through feature-level domain adaptation in training of a vision-based robotic action machine learning (ML) model. Implementations mitigate the reality gap through utilization of embedding consistency losses and/or action consistency losses during training of the action ML model.
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Mitigating reality gap through training a simulation-to-real model using a vision-based robot task model

Generating simulated training examples for training of machine learning model used for robot control

Mitigating reality gap through modification of simulated state data of robotic simulator

Mitigating reality gap through optimization of simulated hardware parameter(s) of simulated robot

Mitigating reality gap through simulating compliant control and/or compliant contact in robotic simulator

Using simulation and domain adaptation for robotic control

Frequently asked questions