Depth from motion for augmented reality for handheld user devices
US-2021004979-A1 · Jan 7, 2021 · US
US11810313B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11810313-B2 |
| Application number | US-202117249095-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 19, 2021 |
| Priority date | Feb 21, 2020 |
| Publication date | Nov 7, 2023 |
| Grant date | Nov 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to an aspect, a real-time active stereo system includes a capture system configured to capture stereo data, where the stereo data includes a first input image and a second input image, and a depth sensing computing system configured to predict a depth map. The depth sensing computing system includes a feature extractor configured to extract features from the first and second images at a plurality of resolutions, an initialization engine configured to generate a plurality of depth estimations, where each of the plurality of depth estimations corresponds to a different resolution, and a propagation engine configured to iteratively refine the plurality of depth estimations based on image warping and spatial propagation.
Opening claim text (preview).
What is claimed is: 1. A real-time active stereo system comprising: a capture system configured to capture stereo data, the stereo data including a first input image and a second input image; and a depth sensing computing system configured to predict a depth map, the depth sensing computing system including: a feature extractor configured to extract features from the first and second input images at a plurality of resolutions; an initialization engine configured to generate a plurality of depth estimations, each of the plurality of depth estimations corresponding to a different resolution and including a three-dimensional (3D) slanted plane hypothesis for a region of a respective depth estimation, the 3D slanted plane hypothesis including a disparity value and a location of a slanted plane; and a propagation engine configured to iteratively refine the plurality of depth estimations based on image warping and spatial propagation. 2. The real-time active stereo system of claim 1 , wherein the initialization engine is configured to predict a first depth estimation based on a matching of the features from the first and second input images at a first resolution, the initialization engine configured to predict a second depth estimation based on a matching of the features from the first and second input images at a second resolution. 3. The real-time active stereo system of claim 2 , wherein the propagation engine is configured to predict, via a first iteration, a refined first depth estimation using the first depth estimation from the initialization engine and the features at the first resolution from the feature extractor, the propagation engine configured to predict, via a second iteration, a refined second depth estimation based on the refined first depth estimation from the first iteration, and the second depth estimation from the initialization engine, the refined second depth estimation being used in a subsequent iteration or as a basis for the depth map. 4. The real-time active stereo system of claim 1 , wherein the initialization engine includes a region feature extractor configured to extract first per-region features using features from the first input image and extract second per-region features using features from the second input image, the initialization engine including a matching engine configured to generate a depth estimation based on a matching of the first per-region features with the second per-region features. 5. The real-time active stereo system of claim 1 , wherein the 3D slanted plane hypothesis includes a feature descriptor that represents information about the slanted plane. 6. The real-time active stereo system of claim 5 , further comprising: a neural network configured to generate the feature descriptor based on costs per region. 7. The real-time active stereo system of claim 1 , wherein the propagation engine includes a warping module configured to generate warped features by warping features of the first input image using a depth estimation received from the initialization engine, a matching engine configured to compute a local cost volume based on a matching of the warped features with features from the second input image, and a convolutional neural network (CNN) module configured to generate a refined depth estimation based on plane hypotheses of the depth estimation and the local cost volume. 8. The real-time active stereo system of claim 7 , wherein the CNN module includes one or more residual blocks configured to apply one or more dilation convolutions. 9. A method for real-time stereo matching comprising: extracting, by a feature extractor, features from a first input image and a second input image at a plurality of resolutions including a first resolution and a second resolution; and generating, by an initialization engine, a plurality of depth estimations at the plurality of resolutions, including: predicting a first depth estimation based on a matching of the features from the first and second input images at the first resolution, the first depth estimation including a three-dimensional (3D) slanted plane hypothesis for each region of a respective depth estimation, the 3D slanted plane hypothesis including a disparity value and a location of a slanted plane; and predicting a second depth estimation based on a matching of the features from the first and second input images at the second resolution; and iteratively refining, by a propagation engine, the plurality of depth estimations based on image warping and spatial propagation, including: predicting, via a first iteration, a refined first depth estimation using the first depth estimation and the features at the first resolution; and predicting, via a second iteration, a refined second depth estimation based on the refined first depth estimation from the first iteration and the second depth estimation, the refined second depth estimation being used in a subsequent iteration or as a basis for a depth map. 10. The method of claim 9 , wherein the 3D slanted plane hypothesis includes a feature descriptor that represents information about the slanted plane. 11. The method of claim 9 , wherein the predicting the first depth estimation includes: extracting, by at least one first convolutional block, first per-region features for each image region using features of the first input image at the first resolution; extracting, by at least one second convolutional block, second per-region features for each image region using features of the second input image at the first resolution; and selecting, by a matching engine, the 3D slanted plane hypothesis for each region having a disparity value with a lowest cost. 12. The method of claim 11 , further comprising: constructing a 3D cost volume based on costs per region, wherein the 3D slanted plane hypothesis is selected based on the costs per region, wherein the 3D cost volume is not stored or used by the propagation engine. 13. The method of claim 12 , wherein the 3D slanted plane hypothesis includes a feature descriptor that describes information about a slanted plane, further comprising: generating, by a neural network, the feature descriptor based on the costs per region and at least one of the first per-region features or the second per-region features. 14. The method of claim 11 , wherein the at least one first convolutional block includes a convolutional block having a stride value that is different from a convolutional block of the at least one second convolutional block. 15. The method of claim 9 , wherein the predicting the refined first depth estimation includes: generating warped features by warping features from the first input image at the first resolution using the first depth estimation; computing a local cost volume based on a matching of the warped features with features of the second input image at the first resolution; obtaining an augmented depth estimation based on the local cost volume and the first depth estimation; and predicting, by a convolution neural network (CNN) module, the refined first depth estimation using the augmented depth estimation. 16. The method of claim 15 , wherein computing the local cost volume includes: displacing disparities in a respective region by an offset value; and computing costs for the respective region. 17. The method of claim 15 , wherein the CNN module includes a plurality of residual blocks including a first residual block and a second residual block, at least one of the first residual block or the second residual block defining one or more dilated convolutions.
from stereo images · CPC title
Physics · mapped topic
Scaling of whole images or parts thereof, e.g. expanding or contracting · CPC title
Erosion or dilatation, e.g. thinning · CPC title
Image signal generators · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.