Stereo image processing
US-12026905-B2 · Jul 2, 2024 · US
US12552040B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12552040-B2 |
| Application number | US-202217839193-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 13, 2022 |
| Priority date | Jun 13, 2022 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for training a neural network to perform 3D object manipulation is described. The method includes extracting features from each image of a synthetic stereo pair of images. The method also includes generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images. The method further includes generating, by the neural network, a feature map based on the low-resolution disparity image and one of the synthetic stereo pair of images. The method also includes manipulating an unknown object perceived from the feature map according to a perception prediction from a prediction head.
Opening claim text (preview).
What is claimed is: 1 . A method for training a neural network to perform 3D object manipulation, the method comprising: extracting features from each image of a synthetic stereo pair of images; generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images; generating, by the neural network, a feature map based on the low-resolution disparity image and features extracted from a single one of the synthetic stereo pair of images; and manipulating an unknown object perceived from the feature map according to a perception prediction from a prediction head. 2 . The method of claim 1 , in which manipulating comprises: generating, by the prediction head, oriented bounding box (OBB) predictions based on the feature map; producing grasp positions according to the OBB predictions; and grasping the unknown object based on the grasp positions. 3 . The method of claim 1 , further comprising: generating non-photorealistic simulation graphics; and generating the synthetic stereo pair of images from the non-photorealistic simulation graphics to provide a left image and a right image as the synthetic stereo pair of images. 4 . The method of claim 1 , further comprising generating a segmentation image based on the feature map. 5 . The method of claim 1 , further comprising detecting keypoints of objects in the synthetic stereo pair of images detected from the feature map. 6 . The method of claim 1 , further comprising generating a full resolution disparity image from the synthetic stereo pair of images based on the feature map. 7 . The method of claim 1 , further comprising planning the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 8 . The method of claim 1 , further comprising planning an object grasp by a robot according to object grasp predictions from video captured by the robot. 9 . A non-transitory computer-readable medium having program code recorded thereon for training a neural network to perform 3D object manipulation, the program code being executed by a processor and comprising: program code to extract features from each image of a synthetic stereo pair of images; program code to generate a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images; program code to generate a feature map based on the low-resolution disparity image and features extracted from a single one of the synthetic stereo pair of images using the neural network; and program code to manipulate an unknown object perceived from the feature map according to a perception prediction from a prediction head. 10 . The non-transitory computer-readable medium of claim 9 , in which the program code to manipulate comprises: program code to generate, by the prediction head, oriented bounding box (OBB) predictions based on the feature map; program code to produce grasp positions according to the OBB predictions; and program code to grasp the unknown object based on the grasp positions. 11 . The non-transitory computer-readable medium of claim 9 , further comprising: program code to generate non-photorealistic simulation graphics; and program code to generate the synthetic stereo pair of images from the non-photorealistic simulation graphics to provide a left image and a right image as the synthetic stereo pair of images. 12 . The non-transitory computer-readable medium of claim 9 , further comprising program code to generate a segmentation image based on the feature map. 13 . The non-transitory computer-readable medium of claim 9 , further comprising program code to detect keypoints of objects in the synthetic stereo pair of images detected from the feature map. 14 . The non-transitory computer-readable medium of claim 9 , further comprising program code to generate a full resolution disparity image from the synthetic stereo pair of images based on the feature map. 15 . The non-transitory computer-readable medium of claim 9 , further comprising program code to plan the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 16 . The non-transitory computer-readable medium of claim 9 , further comprising program code to plan an object grasp by a robot according to object grasp predictions from video captured by the robot. 17 . A system for training a neural network to perform 3D object manipulation, the system comprising: a stereo feature extraction module to extract features from each image of a synthetic stereo pair of images; a disparity image generation module to generate a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images; a feature map generation module to generate a feature map based on the low-resolution disparity image and features extracted from a single one of the synthetic stereo pair of images using the neural network; and a 3D object manipulation module to manipulate an unknown object perceived from the feature map according to a perception prediction from a prediction head. 18 . The system of claim 17 , in which the 3D object manipulation module is further to generate, by the prediction head, oriented bounding box (OBB) predictions based on the feature map, to produce grasp positions according to the OBB predictions, and to grasp the unknown object based on the grasp positions. 19 . The system of claim 17 , further to a planner module to plan the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 20 . The system of claim 17 , further to a planner module to plan an object grasp by a robot according to object grasp predictions from video captured by the robot.
using neural networks · CPC title
Artificial neural networks [ANN] · CPC title
Stereoscopic video; Stereoscopic image sequence · CPC title
Image segmentation from stereoscopic image signals · CPC title
Disparity calculation for image-based rendering · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.