Big aperture blurring method based on dual cameras and tof
US-2022086360-A1 · Mar 17, 2022 · US
US11546568B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11546568-B1 |
| Application number | US-202016811356-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 6, 2020 |
| Priority date | Mar 6, 2020 |
| Publication date | Jan 3, 2023 |
| Grant date | Jan 3, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatuses, systems, and techniques are presented to perform monocular view synthesis of a dynamic scene. Single and multi-view depth information can be determined for a collection of images of a dynamic scene, and a blender network can be used to combine image features for foreground, background, and missing image regions using fused depth maps inferred form the single and multi-view depth information.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving a collection of images of a scene; determining single-view and multi-view depth data for the images of the collection; inferring fused depth maps for the images of the scene using one or more neural networks that model a scale correction function by accepting the single view and multi-view depth data as input, wherein the one or more neural networks apply at least one weight parameter based at least in part on depth data for one or more static regions of the images of the collection, the depth data for the one or more static regions of the images of the collection determined based at least in part on a loss function; and synthesizing an image of the scene for a determined view that is different from views of the images of the collection, the image synthesized using the fused depth maps and image features extracted from the images, based at least in part on a decoder applying at least one skip connection, in the one or more neural networks, between the image features and the depth data. 2. The computer-implemented method of claim 1 , further comprising: synthesizing the image of the scene using a neural network trained to blend foreground features from the single-view depth data with background features from the multi-view depth data. 3. The computer-implemented method of claim 2 , further comprising: causing the neural network to synthesize new image data for one or more regions for which data is not obtainable from the collection of images of the scene. 4. The computer-implemented method of claim 1 , wherein the scene is a dynamic scene including a static background portion and at least one dynamic foreground portion in which at least one object was moving during capture of the collection of images. 5. The computer-implemented method of claim 1 , wherein a point of view of at least one monocular camera capturing the collection of images changed during capture of the collection of images. 6. The computer-implemented method of claim 1 , further comprising: determining optical flow data for the collection of images; and using the optical flow data as a constraint for the synthesizing. 7. The computer-implemented method of claim 1 , further comprising: synthesizing a plurality of images at determined times and for determined views of the scene; and generating a video including video frames corresponding to the plurality of images. 8. The computer-implemented method of claim 1 , further comprising: inferring the fused depth maps using the one or more neural networks further completes the fused depths maps with locally consistent motions. 9. A system comprising: one or more processors; and memory including instructions that, when executed by the one or more processors, cause the system to: receive a collection of images of a scene; determine single-view and multi-view depth data for the images of the collection; infer fused depth maps for the images of the scene using one or more neural networks that model a scale correction function by accepting the single view and multi-view depth data as input, wherein the one or more neural networks apply at least one weight parameter based at least in part on depth data for one or more static regions of the images of the collection the depth data for the one or more static regions of the images of the collection determined based at least in part on a loss function; and synthesize an image of the scene for a determined view that is different from views of the images of the collection, the image synthesized using the fused depth maps and image features extracted from the images, based at least in part on a decoder applying at least one skip connection, in the one or more neural networks, between the image features and the depth data. 10. The system of claim 9 , wherein the instructions when executed further cause the system to: synthesize the image of the scene using a neural network trained to blend foreground features from the single-view depth data with background features from the multi-view depth data. 11. The system of claim 10 , wherein the instructions when executed further cause the system to: cause the neural network to synthesize new image data for one or more regions for which data is not obtainable from the collection of images of the scene. 12. The system of claim 9 , wherein the scene is a dynamic scene including a static background portion and at least one dynamic foreground portion in which at least one object was moving during capture of the collection of images. 13. The system of claim 9 , wherein a point of view of at least one monocular camera capturing the collection of images changed during capture of the collection of images. 14. The system of claim 9 , wherein the instructions when executed further cause the system to: determine optical flow data for the collection of images; and use the optical flow data as a constraint for the synthesizing. 15. A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: receive a collection of images of a scene; determine single-view and multi-view depth data for the images of the collection; infer fused depth maps for the images of the scene using one or more neural networks that model a scale correction function by accepting the single view and multi-view depth data as input, wherein the one or more neural networks apply at least one weight parameter based at least in part on depth data for one or more static regions of the images of the collection, the depth data for the one or more static regions of the images of the collection determined based at least in part on a loss function; and synthesize an image of the scene for a determined view that is different from views of the images of the collection, the image synthesized using the fused depth maps and image features extracted from the images, based at least in part on a decoder applying at least one skip connection, in the one or more neural networks, between the image features and the depth data. 16. The non-transitory machine-readable medium of claim 15 , wherein the instructions if executed further cause the one or more processors to: synthesize the image of the scene using a neural network trained to blend foreground features from the single-view depth data with background features from the multi-view depth data. 17. The non-transitory machine-readable medium of claim 16 , wherein the instructions if executed further cause the one or more processors to: cause the neural network to synthesize new image data for one or more regions for which data is not obtainable from the collection of images of the scene. 18. The non-transitory machine-readable medium of claim 15 , wherein the scene is a dynamic scene including a static background portion and at least one dynamic foreground portion in which at least one object was moving during capture of the collection of images. 19. The non-transitory machine-readable medium of claim 15 , wherein a point of view of at least one monocular camera capturing the collection of images changed during capture of the collection of images. 20. The non-transitory machine-readable medium of claim 15 , wherein the instructions if executed further cause the one or more processors to: determine optical flow data for the collection of images; and use the optical flow data as a constraint for the synthesizing.
Adjusting depth or disparity · CPC title
Synthesising a monoscopic image signal from stereoscopic images, e.g. synthesising a panoramic or high resolution monoscopic image · CPC title
Three-dimensional [3D] modelling for computer graphics · CPC title
Perspective computation · CPC title
Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.