Who is the assignee on this patent?

Toyota Res Inst Inc, Toyota Motor Co Ltd

What technology area does this patent fall under?

Primary CPC classification B25J9/1697. Mapped technology areas include Operations & Transport.

When was this patent published?

Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for unknown object manipulation from pure synthetic stereo data

US12552040B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12552040-B2
Application number	US-202217839193-A
Country	US
Kind code	B2
Filing date	Jun 13, 2022
Priority date	Jun 13, 2022
Publication date	Feb 17, 2026
Grant date	Feb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for training a neural network to perform 3D object manipulation is described. The method includes extracting features from each image of a synthetic stereo pair of images. The method also includes generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images. The method further includes generating, by the neural network, a feature map based on the low-resolution disparity image and one of the synthetic stereo pair of images. The method also includes manipulating an unknown object perceived from the feature map according to a perception prediction from a prediction head.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for training a neural network to perform 3D object manipulation, the method comprising: extracting features from each image of a synthetic stereo pair of images; generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images; generating, by the neural network, a feature map based on the low-resolution disparity image and features extracted from a single one of the synthetic stereo pair of images; and manipulating an unknown object perceived from the feature map according to a perception prediction from a prediction head. 2 . The method of claim 1 , in which manipulating comprises: generating, by the prediction head, oriented bounding box (OBB) predictions based on the feature map; producing grasp positions according to the OBB predictions; and grasping the unknown object based on the grasp positions. 3 . The method of claim 1 , further comprising: generating non-photorealistic simulation graphics; and generating the synthetic stereo pair of images from the non-photorealistic simulation graphics to provide a left image and a right image as the synthetic stereo pair of images. 4 . The method of claim 1 , further comprising generating a segmentation image based on the feature map. 5 . The method of claim 1 , further comprising detecting keypoints of objects in the synthetic stereo pair of images detected from the feature map. 6 . The method of claim 1 , further comprising generating a full resolution disparity image from the synthetic stereo pair of images based on the feature map. 7 . The method of claim 1 , further comprising planning the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 8 . The method of claim 1 , further comprising planning an object grasp by a robot according to object grasp predictions from video captured by the robot. 9 . A non-transitory computer-readable medium having program code recorded thereon for training a neural network to perform 3D object manipulation, the program code being executed by a processor and comprising: program code to extract features from each image of a synthetic stereo pair of images; program code to generate a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images; program code to generate a feature map based on the low-resolution disparity image and features extracted from a single one of the synthetic stereo pair of images using the neural network; and program code to manipulate an unknown object perceived from the feature map according to a perception prediction from a prediction head. 10 . The non-transitory computer-readable medium of claim 9 , in which the program code to manipulate comprises: program code to generate, by the prediction head, oriented bounding box (OBB) predictions based on the feature map; program code to produce grasp positions according to the OBB predictions; and program code to grasp the unknown object based on the grasp positions. 11 . The non-transitory computer-readable medium of claim 9 , further comprising: program code to generate non-photorealistic simulation graphics; and program code to generate the synthetic stereo pair of images from the non-photorealistic simulation graphics to provide a left image and a right image as the synthetic stereo pair of images. 12 . The non-transitory computer-readable medium of claim 9 , further comprising program code to generate a segmentation image based on the feature map. 13 . The non-transitory computer-readable medium of claim 9 , further comprising program code to detect keypoints of objects in the synthetic stereo pair of images detected from the feature map. 14 . The non-transitory computer-readable medium of claim 9 , further comprising program code to generate a full resolution disparity image from the synthetic stereo pair of images based on the feature map. 15 . The non-transitory computer-readable medium of claim 9 , further comprising program code to plan the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 16 . The non-transitory computer-readable medium of claim 9 , further comprising program code to plan an object grasp by a robot according to object grasp predictions from video captured by the robot. 17 . A system for training a neural network to perform 3D object manipulation, the system comprising: a stereo feature extraction module to extract features from each image of a synthetic stereo pair of images; a disparity image generation module to generate a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images; a feature map generation module to generate a feature map based on the low-resolution disparity image and features extracted from a single one of the synthetic stereo pair of images using the neural network; and a 3D object manipulation module to manipulate an unknown object perceived from the feature map according to a perception prediction from a prediction head. 18 . The system of claim 17 , in which the 3D object manipulation module is further to generate, by the prediction head, oriented bounding box (OBB) predictions based on the feature map, to produce grasp positions according to the OBB predictions, and to grasp the unknown object based on the grasp positions. 19 . The system of claim 17 , further to a planner module to plan the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 20 . The system of claim 17 , further to a planner module to plan an object grasp by a robot according to object grasp predictions from video captured by the robot.

Assignees

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/10021
Stereoscopic video; Stereoscopic image sequence · CPC title
H04N2013/0092
Image segmentation from stereoscopic image signals · CPC title
G06T2207/20228
Disparity calculation for image-based rendering · CPC title

Patent family

Related publications grouped by family.

View patent family 89077871

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12552040B2 cover?: A method for training a neural network to perform 3D object manipulation is described. The method includes extracting features from each image of a synthetic stereo pair of images. The method also includes generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images. The method further includes generating, by the neural netwo…
Who is the assignee on this patent?: Toyota Res Inst Inc, Toyota Motor Co Ltd
What technology area does this patent fall under?: Primary CPC classification B25J9/1697. Mapped technology areas include Operations & Transport.
When was this patent published?: Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Stereo image processing

Adaptive face depth image generation

Embeddings + svm for teaching traversability

Domain Restriction of Neural Networks Through Synthetic Data Pre-Training

Data synthesis for autonomous control systems

Automatic device and vehicle pairing via detected emitted signals

Frequently asked questions