Learning rigidity of dynamic scenes for three-dimensional scene flow estimation

US10929987B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10929987-B2
Application numberUS-201816052528-A
CountryUS
Kind codeB2
Filing dateAug 1, 2018
Priority dateAug 16, 2017
Publication dateFeb 23, 2021
Grant dateFeb 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; processing the color data by layers of a neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image; processing the color data by the layers of the neural network model to produce a pose of the second viewpoint, the pose including a position and orientation in the 3D space; warping the pose to generate 2D viewpoint motion flow data for the second image; and subtracting the 2D viewpoint motion flow data from two-dimensional optical flow data for the sequence of images to produce estimated projected 3D scene flow data for the second image. 2. The computer-implemented method of claim 1 , further comprising refining the pose based on the two-dimensional optical flow data. 3. The computer-implemented method of claim 1 , further comprising refining the segmentation data based on the two-dimensional optical flow data. 4. The computer-implemented method of claim 1 , further comprising: receiving depth data for the sequence of images; and processing the depth data with the color data to generate the segmentation data. 5. The computer-implemented method of claim 1 , further comprising: processing the sequence of images to extract depth data; and processing the depth data with the color data to generate the segmentation data. 6. The computer-implemented method of claim 1 , wherein the segmentation data is a mask comprising a single bit for each pixel in the second image. 7. The computer-implemented method of claim 1 , wherein the layers of the neural network model comprise one or more convolutional layers followed by one or more deconvolutional layers. 8. The computer-implemented method of claim 1 , wherein the estimated projected 3D scene flow data is used for one or more of robot manipulation, dynamic scene reconstruction, autonomous driving, action recognition, and video analysis. 9. A computer-implemented method, comprising: training a neural network model using a dataset including a first image sequence for viewpoint motion and a static scene, a second image sequence for scene motion and a static viewpoint, and a third image sequence for simultaneous viewpoint motion and scene motion; receiving color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; and processing the color data by layers of the neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image. 10. The computer-implemented method of claim 9 , wherein a portion of the dataset includes a real background scene and synthetic foreground objects. 11. A system, comprising: a processor configured to: receive color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; process the color data by layers of a neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image; process the color data by the layers of the neural network model to produce a pose of the second viewpoint, the pose including a position and orientation in the 3D space; warp the pose to generate 2D viewpoint motion flow data for the second image; and subtract the 2D viewpoint motion flow data from two-dimensional optical flow data for the sequence of images to produce estimated projected 3D scene flow data for the second image. 12. The system of claim 11 , wherein the processor unit is further configured to refine the pose based on the two-dimensional optical flow data. 13. The system of claim 11 , wherein the processor is further configured to refine the segmentation data based on the two-dimensional optical flow data. 14. The system of claim 11 , wherein the processor is further configured to: receive depth data for the sequence of images; and process the depth data with the color data to generate the segmentation data. 15. The system of claim 11 , wherein the segmentation data is a mask comprising a single bit for each pixel in the second image. 16. A system comprising: a processor configured to: train a neural network model using a dataset including a first image sequence for viewpoint motion and a static scene, a second image sequence for scene motion and a static viewpoint, and a third image sequence for simultaneous viewpoint motion and scene motion; receive color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; and process the color data by layers of the neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image. 17. A non-transitory, computer-readable storage medium storing instructions that, when executed by a processor, cause the processor unit to: receive color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; process the color data by layers of a neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image; process the color data by the layers of the neural network model to produce a pose of the second viewpoint, the pose including a position and orientation in the 3D space; warp the pose to generate 2D viewpoint motion flow data for the second image; and subtract the 2D viewpoint motion flow data from two-dimensional optical flow data for the sequence of images to produce estimated projected 3D scene flow data for the second image. 18. The non-transitory, computer-readable storage medium of claim 17 , further comprising refining the pose based on the two-dimensional optical flow data. 19. The non-transitory, computer-readable storage medium of claim 17 , further comprising refining the segmentation data based on the two-dimensional optical flow data. 20. A non-transitory, computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to: train a neural network model using a dataset including a first image sequence for viewpoint motion and a static scene, a second image sequence for scene motion and a static viewpoint, and a third image sequenc

Assignees

Inventors

Classifications

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Three-dimensional [3D] objects · CPC title

  • Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries · CPC title

  • using specific electronic processors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10929987B2 cover?
A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field repre…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T7/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).