What technology area does this patent fall under?

Primary CPC classification H04N13/128. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

View synthesis for dynamic scenes

US11546568B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11546568-B1
Application number	US-202016811356-A
Country	US
Kind code	B1
Filing date	Mar 6, 2020
Priority date	Mar 6, 2020
Publication date	Jan 3, 2023
Grant date	Jan 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques are presented to perform monocular view synthesis of a dynamic scene. Single and multi-view depth information can be determined for a collection of images of a dynamic scene, and a blender network can be used to combine image features for foreground, background, and missing image regions using fused depth maps inferred form the single and multi-view depth information.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving a collection of images of a scene; determining single-view and multi-view depth data for the images of the collection; inferring fused depth maps for the images of the scene using one or more neural networks that model a scale correction function by accepting the single view and multi-view depth data as input, wherein the one or more neural networks apply at least one weight parameter based at least in part on depth data for one or more static regions of the images of the collection, the depth data for the one or more static regions of the images of the collection determined based at least in part on a loss function; and synthesizing an image of the scene for a determined view that is different from views of the images of the collection, the image synthesized using the fused depth maps and image features extracted from the images, based at least in part on a decoder applying at least one skip connection, in the one or more neural networks, between the image features and the depth data. 2. The computer-implemented method of claim 1 , further comprising: synthesizing the image of the scene using a neural network trained to blend foreground features from the single-view depth data with background features from the multi-view depth data. 3. The computer-implemented method of claim 2 , further comprising: causing the neural network to synthesize new image data for one or more regions for which data is not obtainable from the collection of images of the scene. 4. The computer-implemented method of claim 1 , wherein the scene is a dynamic scene including a static background portion and at least one dynamic foreground portion in which at least one object was moving during capture of the collection of images. 5. The computer-implemented method of claim 1 , wherein a point of view of at least one monocular camera capturing the collection of images changed during capture of the collection of images. 6. The computer-implemented method of claim 1 , further comprising: determining optical flow data for the collection of images; and using the optical flow data as a constraint for the synthesizing. 7. The computer-implemented method of claim 1 , further comprising: synthesizing a plurality of images at determined times and for determined views of the scene; and generating a video including video frames corresponding to the plurality of images. 8. The computer-implemented method of claim 1 , further comprising: inferring the fused depth maps using the one or more neural networks further completes the fused depths maps with locally consistent motions. 9. A system comprising: one or more processors; and memory including instructions that, when executed by the one or more processors, cause the system to: receive a collection of images of a scene; determine single-view and multi-view depth data for the images of the collection; infer fused depth maps for the images of the scene using one or more neural networks that model a scale correction function by accepting the single view and multi-view depth data as input, wherein the one or more neural networks apply at least one weight parameter based at least in part on depth data for one or more static regions of the images of the collection the depth data for the one or more static regions of the images of the collection determined based at least in part on a loss function; and synthesize an image of the scene for a determined view that is different from views of the images of the collection, the image synthesized using the fused depth maps and image features extracted from the images, based at least in part on a decoder applying at least one skip connection, in the one or more neural networks, between the image features and the depth data. 10. The system of claim 9 , wherein the instructions when executed further cause the system to: synthesize the image of the scene using a neural network trained to blend foreground features from the single-view depth data with background features from the multi-view depth data. 11. The system of claim 10 , wherein the instructions when executed further cause the system to: cause the neural network to synthesize new image data for one or more regions for which data is not obtainable from the collection of images of the scene. 12. The system of claim 9 , wherein the scene is a dynamic scene including a static background portion and at least one dynamic foreground portion in which at least one object was moving during capture of the collection of images. 13. The system of claim 9 , wherein a point of view of at least one monocular camera capturing the collection of images changed during capture of the collection of images. 14. The system of claim 9 , wherein the instructions when executed further cause the system to: determine optical flow data for the collection of images; and use the optical flow data as a constraint for the synthesizing. 15. A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: receive a collection of images of a scene; determine single-view and multi-view depth data for the images of the collection; infer fused depth maps for the images of the scene using one or more neural networks that model a scale correction function by accepting the single view and multi-view depth data as input, wherein the one or more neural networks apply at least one weight parameter based at least in part on depth data for one or more static regions of the images of the collection, the depth data for the one or more static regions of the images of the collection determined based at least in part on a loss function; and synthesize an image of the scene for a determined view that is different from views of the images of the collection, the image synthesized using the fused depth maps and image features extracted from the images, based at least in part on a decoder applying at least one skip connection, in the one or more neural networks, between the image features and the depth data. 16. The non-transitory machine-readable medium of claim 15 , wherein the instructions if executed further cause the one or more processors to: synthesize the image of the scene using a neural network trained to blend foreground features from the single-view depth data with background features from the multi-view depth data. 17. The non-transitory machine-readable medium of claim 16 , wherein the instructions if executed further cause the one or more processors to: cause the neural network to synthesize new image data for one or more regions for which data is not obtainable from the collection of images of the scene. 18. The non-transitory machine-readable medium of claim 15 , wherein the scene is a dynamic scene including a static background portion and at least one dynamic foreground portion in which at least one object was moving during capture of the collection of images. 19. The non-transitory machine-readable medium of claim 15 , wherein a point of view of at least one monocular camera capturing the collection of images changed during capture of the collection of images. 20. The non-transitory machine-readable medium of claim 15 , wherein the instructions if executed further cause the one or more processors to: determine optical flow data for the collection of images; and use the optical flow data as a constraint for the synthesizing.

Assignees

Nvidia Corp

Inventors

Classifications

H04N13/128Primary
Adjusting depth or disparity · CPC title
H04N2013/0088
Synthesising a monoscopic image signal from stereoscopic images, e.g. synthesising a panoramic or high resolution monoscopic image · CPC title
G06T17/00
Three-dimensional [3D] modelling for computer graphics · CPC title
G06T15/20
Perspective computation · CPC title
H04N13/111Primary
Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation · CPC title

Patent family

Related publications grouped by family.

View patent family 84693207

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11546568B1 cover?: Apparatuses, systems, and techniques are presented to perform monocular view synthesis of a dynamic scene. Single and multi-view depth information can be determined for a collection of images of a dynamic scene, and a blender network can be used to combine image features for foreground, background, and missing image regions using fused depth maps inferred form the single and multi-view depth in…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification H04N13/128. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Big aperture blurring method based on dual cameras and tof

Scale-aware monocular localization and mapping

Artificial Intelligence-Based Sequencing

Associating lidar data and image data

System and method for achieving fast and reliable time-to-contact estimation using vision and range sensor data for autonomous navigation

Frequently asked questions