What technology area does this patent fall under?

Primary CPC classification G06T7/254. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Learning rigidity of dynamic scenes for three-dimensional scene flow estimation

US10929987B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10929987-B2
Application number	US-201816052528-A
Country	US
Kind code	B2
Filing date	Aug 1, 2018
Priority date	Aug 16, 2017
Publication date	Feb 23, 2021
Grant date	Feb 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; processing the color data by layers of a neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image; processing the color data by the layers of the neural network model to produce a pose of the second viewpoint, the pose including a position and orientation in the 3D space; warping the pose to generate 2D viewpoint motion flow data for the second image; and subtracting the 2D viewpoint motion flow data from two-dimensional optical flow data for the sequence of images to produce estimated projected 3D scene flow data for the second image. 2. The computer-implemented method of claim 1 , further comprising refining the pose based on the two-dimensional optical flow data. 3. The computer-implemented method of claim 1 , further comprising refining the segmentation data based on the two-dimensional optical flow data. 4. The computer-implemented method of claim 1 , further comprising: receiving depth data for the sequence of images; and processing the depth data with the color data to generate the segmentation data. 5. The computer-implemented method of claim 1 , further comprising: processing the sequence of images to extract depth data; and processing the depth data with the color data to generate the segmentation data. 6. The computer-implemented method of claim 1 , wherein the segmentation data is a mask comprising a single bit for each pixel in the second image. 7. The computer-implemented method of claim 1 , wherein the layers of the neural network model comprise one or more convolutional layers followed by one or more deconvolutional layers. 8. The computer-implemented method of claim 1 , wherein the estimated projected 3D scene flow data is used for one or more of robot manipulation, dynamic scene reconstruction, autonomous driving, action recognition, and video analysis. 9. A computer-implemented method, comprising: training a neural network model using a dataset including a first image sequence for viewpoint motion and a static scene, a second image sequence for scene motion and a static viewpoint, and a third image sequence for simultaneous viewpoint motion and scene motion; receiving color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; and processing the color data by layers of the neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image. 10. The computer-implemented method of claim 9 , wherein a portion of the dataset includes a real background scene and synthetic foreground objects. 11. A system, comprising: a processor configured to: receive color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; process the color data by layers of a neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image; process the color data by the layers of the neural network model to produce a pose of the second viewpoint, the pose including a position and orientation in the 3D space; warp the pose to generate 2D viewpoint motion flow data for the second image; and subtract the 2D viewpoint motion flow data from two-dimensional optical flow data for the sequence of images to produce estimated projected 3D scene flow data for the second image. 12. The system of claim 11 , wherein the processor unit is further configured to refine the pose based on the two-dimensional optical flow data. 13. The system of claim 11 , wherein the processor is further configured to refine the segmentation data based on the two-dimensional optical flow data. 14. The system of claim 11 , wherein the processor is further configured to: receive depth data for the sequence of images; and process the depth data with the color data to generate the segmentation data. 15. The system of claim 11 , wherein the segmentation data is a mask comprising a single bit for each pixel in the second image. 16. A system comprising: a processor configured to: train a neural network model using a dataset including a first image sequence for viewpoint motion and a static scene, a second image sequence for scene motion and a static viewpoint, and a third image sequence for simultaneous viewpoint motion and scene motion; receive color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; and process the color data by layers of the neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image. 17. A non-transitory, computer-readable storage medium storing instructions that, when executed by a processor, cause the processor unit to: receive color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space including a first image and a second image, wherein the first image is captured from a first viewpoint and the second image is captured from a second viewpoint; process the color data by layers of a neural network model to generate segmentation data indicating a portion of the second image where a first object changes position or shape relative to a position or shape of the first object in the first image; process the color data by the layers of the neural network model to produce a pose of the second viewpoint, the pose including a position and orientation in the 3D space; warp the pose to generate 2D viewpoint motion flow data for the second image; and subtract the 2D viewpoint motion flow data from two-dimensional optical flow data for the sequence of images to produce estimated projected 3D scene flow data for the second image. 18. The non-transitory, computer-readable storage medium of claim 17 , further comprising refining the pose based on the two-dimensional optical flow data. 19. The non-transitory, computer-readable storage medium of claim 17 , further comprising refining the segmentation data based on the two-dimensional optical flow data. 20. A non-transitory, computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to: train a neural network model using a dataset including a first image sequence for viewpoint motion and a static scene, a second image sequence for scene motion and a static viewpoint, and a third image sequenc

Assignees

Nvidia Corp

Inventors

Classifications

G06V20/10
Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title
G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V20/64
Three-dimensional [3D] objects · CPC title
G06V10/772
Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries · CPC title
G06V10/955
using specific electronic processors · CPC title

Patent family

Related publications grouped by family.

View patent family 65361107

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10929987B2 cover?: A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field repre…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06T7/254. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and device for recognizing motion

Depth sensor noise

Method of encoding a video data signal for use with a multi-view rendering device

Interpolated minimum-maximum compression/decompression for efficient processing of graphics data at computing devices

Method and system of 3D image capture with dynamic cameras

Information processing method and apparatus for calculating information regarding measurement target on the basis of captured images

Depth image compression and decompression utilizing depth and amplitude data

Frequently asked questions