Methods, devices and computer program products for gradient based depth reconstructions with robust statistics
US-2021012568-A1 · Jan 14, 2021 · US
US11663733B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11663733-B2 |
| Application number | US-202217656165-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 23, 2022 |
| Priority date | Sep 20, 2019 |
| Publication date | May 30, 2023 |
| Grant date | May 30, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes obtaining a reference image and a target image each representing an environment containing moving features and static features. The method also includes determining an object mask configured to mask out the moving features and preserves the static features in the target image. The method additionally includes determining, based on motion parallax between the reference image and the target image, a static depth image representing depth values of the static features in the target image. The method further includes generating, by way of a machine learning model, a dynamic depth image representing depth values of both the static features and the moving features in the target image. The model is trained to generate the dynamic depth image by determining depth values of at least the moving features based on the target image, the object mask, and the static depth image.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: obtaining a reference image and a target image each representing an environment containing a moving feature and a static feature, wherein the reference image has been captured by a camera at a first time and the target image has been captured by the camera at a second time different from the first time; determining an object mask configured to (i) mask out the moving feature in the target image and (ii) preserve the static feature in the target image; determining, based on one or more of the reference image or the target image, a static depth image that represents depth values of the static features in the target image; and generating, using a machine learning (ML) model and based on (i) the static depth image, (ii) the object mask, and (iii) one or more of the target image or the reference image, a dynamic depth image that represents depth values of both the static features and the moving features in the target image. 2. The computer-implemented method of claim 1 , wherein determining the static depth image comprises: processing the one or more of the reference image or the target image by at least one of: (i) a multi view stereo (MVS) algorithm, (ii) a structure from motion (SfM) algorithm, or (iii) a motion parallax algorithm. 3. The computer-implemented method of claim 1 , wherein the object mask comprises a binary image that assigns a first value to a region of the target image that contains the moving feature and a second value to a region of the target image that contains the static feature. 4. The computer-implemented method of claim 1 , wherein the ML model has been trained using a training process comprising: obtaining a video captured by a camera moving through a training environment that contains (i) a static training feature and (ii) a movable training feature that is fixed in a respective pose while being filmed by the camera; determining a supervised depth image of a scene represented by the video, wherein the supervised depth image is determined based on (i) a training reference image from the video that represent the scene from a first point of view and (ii) a training target image from the video that represent the scene from a second point of view different from the first point of view; and determining one or more parameters of the ML model based on the supervised depth image. 5. The computer-implemented method of claim 4 , wherein determining the one or more parameters of the ML model comprises: determining a training object mask configured to (i) mask out the movable training feature in the training target image and (ii) preserve the static training feature in the training target image; determining, based on at least one of the training reference image and the training target image, a training static depth image that represents depth values of the static training feature in the training target image; and generating, using the ML model and based on (i) the training static depth image, (ii) the training object mask, and (iii) one or more of the training target image or the training reference image, a training dynamic depth image that represents depth values of both the static training feature and the movable training feature in the training target image; determining a difference between the training dynamic depth image and the supervised depth image; and adjusting the one or more parameters of the ML model based on the difference. 6. The computer-implemented method of claim 4 , wherein the movable training feature comprise a first human, wherein the moving feature comprises a second human, and wherein the object mask comprises a human-shaped region. 7. The computer-implemented method of claim 1 , wherein the camera is moving through the environment while capturing the reference image and the target image, wherein the static feature maintains a fixed pose within the environment between the first time and the second time, and wherein a pose of the moving feature within the environment changes between the first time and the second time. 8. The computer-implemented method of claim 1 , wherein determining the object mask comprises processing the target image by way of an object instance segmentation algorithm configured to identify the moving feature within the target image and generate a mask region representing the moving feature. 9. The computer-implemented method of claim 1 , wherein determining the static depth image comprises: determining an optical flow image based on the reference image and the target image; determining a camera pose associated with the target image; and determining a motion parallax depth image that represents depth values of both the static feature and the moving feature in the target image based on the optical flow image and the camera pose. 10. The computer-implemented method of claim 1 , further comprising: determining a confidence map that corresponds to the static depth image and indicates, for each respective pixel within the static depth image, a confidence value associated with the depth value of the respective pixel, wherein the ML model is configured to generate the dynamic depth image further based on the confidence map. 11. The computer-implemented method of claim 10 , further comprising: based on the confidence map and prior to providing the static depth image as input to the ML model, removing, from the static depth image, pixels associated with corresponding confidence values that are below a threshold confidence value. 12. The computer-implemented method of claim 10 , wherein determining the confidence map comprises: determining a left-right consistency between (i) a forward optical flow field and (ii) a backward optical flow field, each determined based on the target image and the reference image; determining an extent to which the forward optical flow field complies with an epipolar constraint of the reference image and the target image; determining an extent of parallax between respective portions of the target image and the reference image; and determining the confidence map based on (i) the left-right consistency, (ii) the extent to which the forward optical flow field complies with the epipolar constraint, and (iii) the extent of parallax. 13. The computer-implemented method of claim 1 , further comprising: applying a focus effect to a selected feature of the target image based on the dynamic depth image. 14. The computer-implemented method of claim 1 , further comprising: inserting into the target image a visual representation of an object at a selected position within the environment; determining, based on the dynamic depth image and the selected position, an occlusion between the visual representation of the object and at least one feature of the target image; and rendering the target image to indicate the object, the at least one feature, and the occlusion therebetween. 15. The computer-implemented method of claim 1 , wherein the reference image and the target image form part of a video, and wherein the method further comprises: removing from the target image a visual representation of the moving feature; and inpainting, based on other image frames within the video and the dynamic depth image, portions of the environment within the target image that, prior to removal of the moving feature, were occluded by the moving feature and have been exposed by removal of the moving feature. 16. The computer-implemented method of claim 1 , wherein the reference image and the target image form part of a video, and wherein the method further com
using feature-based methods · CPC title
using feature-based methods, e.g. the tracking of corners or segments · CPC title
Motion-based segmentation · CPC title
Camera pose · CPC title
from motion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.