System and method for unknown object manipulation from pure synthetic stereo data

US2023398692A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023398692-A1
Application numberUS-202217839193-A
CountryUS
Kind codeA1
Filing dateJun 13, 2022
Priority dateJun 13, 2022
Publication dateDec 14, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for training a neural network to perform 3D object manipulation is described. The method includes extracting features from each image of a synthetic stereo pair of images. The method also includes generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images. The method further includes generating, by the neural network, a feature map based on the low-resolution disparity image and one of the synthetic stereo pair of images. The method also includes manipulating an unknown object perceived from the feature map according to a perception prediction from a prediction head.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for training a neural network to perform 3D object manipulation, the method comprising: extracting features from each image of a synthetic stereo pair of images; generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images: generating, by the neural network, a feature map based on the low-resolution disparity image and one of the synthetic stereo pair of images; and manipulating an unknown object perceived from the feature map according to a perception prediction from a prediction head. 2 . The method of claim 1 , in which manipulating comprises: generating, by the prediction head, oriented bounding box (OBB) predictions based on the feature map; producing grasp positions according to the OBB predictions; and grasping the unknown object based on the grasp positions. 3 . The method of claim 1 , further comprising: generating non-photorealistic simulation graphics; and generating the synthetic stereo pair of images from the non-photorealistic simulation graphics to provide a left image and a right image as the synthetic stereo pair of images. 4 . The method of claim 1 , further comprising generating a segmentation image based on the feature map. 5 . The method of claim 1 , further comprising detecting keypoints of objects in the synthetic stereo pair of images detected from the feature map. 6 . The method of claim 1 , further comprising generating a full resolution disparity image from the synthetic stereo pair of images based on the feature map. 7 . The method of claim 1 , further comprising planning the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 8 . The method of claim 1 , further comprising planning an object grasp by a robot according to object grasp predictions from video captured by the robot. 9 . A non-transitory computer-readable medium having program code recorded thereon for training a neural network to perform 3D object manipulation, the program code being executed by a processor and comprising: program code to extract features from each image of a synthetic stereo pair of images; program code to generate a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images: program code to generate a feature map based on the low-resolution disparity image and one of the synthetic stereo pair of images using the neural network; and program code to manipulate an unknown object perceived from the feature map according to a perception prediction from a prediction head. 10 . The non-transitory computer-readable medium of claim 9 , in which the program code to manipulate comprises: program code to generate, by the prediction head, oriented bounding box (OBB) predictions based on the feature map; program code to produce grasp positions according to the OBB predictions; and program code to grasp the unknown object based on the grasp positions. 11 . The non-transitory computer-readable medium of claim 9 , further comprising: program code to generate non-photorealistic simulation graphics; and program code to generate the synthetic stereo pair of images from the non-photorealistic simulation graphics to provide a left image and a right image as the synthetic stereo pair of images. 12 . The non-transitory computer-readable medium of claim 9 , further comprising program code to generate a segmentation image based on the feature map. 13 . The non-transitory computer-readable medium of claim 9 , further comprising program code to detect keypoints of objects in the synthetic stereo pair of images detected from the feature map. 14 . The non-transitory computer-readable medium of claim 9 , further comprising program code to generate a full resolution disparity image from the synthetic stereo pair of images based on the feature map. 15 . The non-transitory computer-readable medium of claim 9 , further comprising program code to plan the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 16 . The non-transitory computer-readable medium of claim 9 , further comprising program code to plan an object grasp by a robot according to object grasp predictions from video captured by the robot. 17 . A system for training a neural network to perform 3D object manipulation, the system comprising: a stereo feature extraction module to extract features from each image of a synthetic stereo pair of images; a disparity image generation module to generate a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images: a feature map generation module to generate a feature map based on the low-resolution disparity image and one of the synthetic stereo pair of images using the neural network; and a 3D object manipulation module to manipulate an unknown object perceived from the feature map according to a perception prediction from a prediction head. 18 . The system of claim 17 , in which the 3D object manipulation module is further to generate, by the prediction head, oriented bounding box (OBB) predictions based on the feature map, to produce grasp positions according to the OBB predictions, and to grasp the unknown object based on the grasp positions. 19 . The system of claim 17 , further a planner module to plan the manipulating of the unknown object by a robot according to the perception prediction from the prediction head. 20 . The system of claim 17 , further a planner module to plan an object grasp by a robot according to object grasp predictions from video captured by the robot.

Assignees

Inventors

Classifications

  • B25J9/1697Primary

    Vision controlled systems · CPC title

  • Adjusting depth or disparity · CPC title

  • from three-dimensional [3D] object models, e.g. computer-generated stereoscopic image signals · CPC title

  • Segmentation; Edge detection (motion-based segmentation G06T7/215) · CPC title

  • using feature-based methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023398692A1 cover?
A method for training a neural network to perform 3D object manipulation is described. The method includes extracting features from each image of a synthetic stereo pair of images. The method also includes generating a low-resolution disparity image based on the features extracted from each image of the synthetic stereo pair of images. The method further includes generating, by the neural netwo…
Who is the assignee on this patent?
Toyota Res Inst Inc, Toyota Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification B25J9/1697. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Thu Dec 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).