Aperture supervision for single-view depth prediction

US11113832B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11113832-B2
Application numberUS-201716759808-A
CountryUS
Kind codeB2
Filing dateNov 3, 2017
Priority dateNov 3, 2017
Publication dateSep 7, 2021
Grant dateSep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Example embodiments allow for training of artificial neural networks (ANNs) to generate depth maps based on images. The ANNs are trained based on a plurality of sets of images, where each set of images represents a single scene and the images in such a set of images differ with respect to image aperture and/or focal distance. An untrained ANN generates a depth map based on one or more images in a set of images. This depth map is used to generate, using the image(s) in the set, a predicted image that corresponds, with respect to image aperture and/or focal distance, to one of the images in the set. Differences between the predicted image and the corresponding image are used to update the ANN. ANNs tramed in this manner are especially suited for generating depth maps used to perform simulated image blur on small-aperture images.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: obtaining a plurality of images, wherein a set of at least two images of the plurality of images describe a common scene, wherein the set of at least two images comprises a source image having a first depth-of-field and a target image having a second depth-of-field, wherein the second depth-of-field is less than the first depth-of-field; determining, using an artificial neural network, a depth map for the common scene based on the source image; determining, based on the determined depth map for the common scene, a predicted image based on the source image such that the predicted image has a depth-of-field corresponding to the second depth-of-field; determining a difference between the predicted image and the target image; updating the artificial neural network based on the determined difference; obtaining an image of a scene of interest; and using the updated artificial neural network to generate a depth map for the scene of interest based on the image of the scene of interest. 2. The method of claim 1 , wherein determining, based on the first depth map, a predicted image based on the source image comprises using a differentiable aperture rendering function. 3. The method of claim 2 , wherein the using the differentiable aperture rendering function to determine the predicted image comprises: determining an estimated light field based on the source image; and based on the first depth map, shearing and projecting the estimated light field to determine the predicted image. 4. The method of claim 1 , wherein determining, using the artificial neural network, the depth map for the common scene based on the source image comprises: determining, using the artificial neural network, a set of depth values based on the source image, wherein each depth value of the set of depth values corresponds to a respective location within the source image; and upsampling the set of depth values to generate the depth map for the common scene. 5. The method of claim 4 , wherein upsampling the set of depth values to generate the depth map for the common scene comprises using a bilateral method to upsample the set of depth values based on the source image. 6. The method of claim 1 , wherein the target image is a first target image, wherein the predicted image is a first predicted image, wherein the determined difference is a first determined difference, wherein the set of at least two images further comprises a second target image having a third depth-of-field, wherein the third depth-of-field differs from the second depth-of-field and is less than the first depth-of-field, the method further comprising: determining, based on the determined depth map for the common scene, a second predicted image based on the source image such that the second predicted image has a depth-of-field corresponding to the third depth-of-field; and determining a second difference between the second predicted image and the second target image, wherein updating the artificial neural network comprises updating the neural network based on the second difference. 7. The method of claim 1 , wherein obtaining the set of at least two images that describe the common scene comprises: capturing, using a light field camera, a light field from the common scene; generating the source image based on the captured light field such that the source image has the first depth-of-field; and generating the target image based on the captured light field such that the target image has the second depth-of-field. 8. The method of claim 1 , wherein obtaining the set of at least two images that describe the common scene comprises: capturing, using a camera set to a first aperture setting, the source image; and capturing, using the camera set to a second aperture setting, the target image, wherein the second aperture setting is wider than the first aperture setting. 9. The method of claim 1 , wherein obtaining the image of the scene of interest comprises operating a cell phone to capture the image of the scene of interest, the method further comprising: transmitting, from a server to the cell phone, an indication of the updated artificial neural network, wherein using the updated artificial neural network to generate the depth map for the scene of interest based on the image of the scene of interest comprises a processor of the cell phone using the updated artificial neural network to generate the depth map for the scene of interest. 10. The method of claim 1 , further comprising: performing image processing on the image of the scene of interest based on the determined depth map for the scene of interest. 11. The method of claim 1 , wherein the artificial neural network is a convolutional neural network. 12. A method comprising: obtaining, by a system, a plurality of images, wherein a set of at least two images of the plurality of images describe a common scene, wherein the set of at least two images comprises a source image having a first depth-of-field and a target image having a second depth-of-field, wherein the second depth-of-field is less than the first depth-of-field; determining, by the system using an artificial neural network, a depth map for the common scene based on the source image; determining, by the system based on the determined depth map for the common scene, a predicted image based on the source image such that the predicted image has a depth-of-field corresponding to the second depth-of-field; determining, by the system, a difference between the predicted image and the target image; updating, by the system, the artificial neural network based on the determined difference; and transmitting, from the system to a remote device, an indication of the updated artificial neural network. 13. The method of claim 12 , wherein determining, based on the first depth map, a predicted image based on the source image comprises using a differentiable aperture rendering function. 14. The method of claim 13 , wherein the using the differentiable aperture rendering function to determine the predicted image comprises: determining an estimated light field based on the source image; and based on the first depth map, shearing and projecting the estimated light field to determine the predicted image. 15. The method of claim 12 , wherein the target image is a first target image, wherein the predicted image is a first predicted image, wherein the determined difference is a first determined difference, wherein the set of at least two images further comprises a second target image having a third depth-of-field, wherein the third depth-of-field differs from the second depth-of-field and is less than the first depth-of-field, the method further comprising: determining, based on the determined depth map for the common scene, a second predicted image based on the source image such that the second predicted image has a depth-of-field corresponding to the third depth-of-field; and determining a second difference between the second predicted image and the second target image, wherein updating the artificial neural network comprises updating the neural network based on the second difference. 16. The method of claim 12 , wherein the artificial neural network is a convolutional neural network. 17. A method comprising: obtaining a plurality of images of a scene, wherein the images each have a shallow depth-of-field and differ with respect to focal distance; determining, using an artificial neural network, a depth map for the scene based on the plurality of images; determining, based on the plurality

Assignees

Inventors

Classifications

  • G06T7/557Primary

    from light fields, e.g. from plenoptic cameras · CPC title

  • G01S11/12Primary

    using electromagnetic waves other than radio waves · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • Training; Learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11113832B2 cover?
Example embodiments allow for training of artificial neural networks (ANNs) to generate depth maps based on images. The ANNs are trained based on a plurality of sets of images, where each set of images represents a single scene and the images in such a set of images differ with respect to image aperture and/or focal distance. An untrained ANN generates a depth map based on one or more images in…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06T7/557. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).