Real-time stereo matching using a hierarchical iterative refinement network

US11810313B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11810313-B2
Application numberUS-202117249095-A
CountryUS
Kind codeB2
Filing dateFeb 19, 2021
Priority dateFeb 21, 2020
Publication dateNov 7, 2023
Grant dateNov 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to an aspect, a real-time active stereo system includes a capture system configured to capture stereo data, where the stereo data includes a first input image and a second input image, and a depth sensing computing system configured to predict a depth map. The depth sensing computing system includes a feature extractor configured to extract features from the first and second images at a plurality of resolutions, an initialization engine configured to generate a plurality of depth estimations, where each of the plurality of depth estimations corresponds to a different resolution, and a propagation engine configured to iteratively refine the plurality of depth estimations based on image warping and spatial propagation.

First claim

Opening claim text (preview).

What is claimed is: 1. A real-time active stereo system comprising: a capture system configured to capture stereo data, the stereo data including a first input image and a second input image; and a depth sensing computing system configured to predict a depth map, the depth sensing computing system including: a feature extractor configured to extract features from the first and second input images at a plurality of resolutions; an initialization engine configured to generate a plurality of depth estimations, each of the plurality of depth estimations corresponding to a different resolution and including a three-dimensional (3D) slanted plane hypothesis for a region of a respective depth estimation, the 3D slanted plane hypothesis including a disparity value and a location of a slanted plane; and a propagation engine configured to iteratively refine the plurality of depth estimations based on image warping and spatial propagation. 2. The real-time active stereo system of claim 1 , wherein the initialization engine is configured to predict a first depth estimation based on a matching of the features from the first and second input images at a first resolution, the initialization engine configured to predict a second depth estimation based on a matching of the features from the first and second input images at a second resolution. 3. The real-time active stereo system of claim 2 , wherein the propagation engine is configured to predict, via a first iteration, a refined first depth estimation using the first depth estimation from the initialization engine and the features at the first resolution from the feature extractor, the propagation engine configured to predict, via a second iteration, a refined second depth estimation based on the refined first depth estimation from the first iteration, and the second depth estimation from the initialization engine, the refined second depth estimation being used in a subsequent iteration or as a basis for the depth map. 4. The real-time active stereo system of claim 1 , wherein the initialization engine includes a region feature extractor configured to extract first per-region features using features from the first input image and extract second per-region features using features from the second input image, the initialization engine including a matching engine configured to generate a depth estimation based on a matching of the first per-region features with the second per-region features. 5. The real-time active stereo system of claim 1 , wherein the 3D slanted plane hypothesis includes a feature descriptor that represents information about the slanted plane. 6. The real-time active stereo system of claim 5 , further comprising: a neural network configured to generate the feature descriptor based on costs per region. 7. The real-time active stereo system of claim 1 , wherein the propagation engine includes a warping module configured to generate warped features by warping features of the first input image using a depth estimation received from the initialization engine, a matching engine configured to compute a local cost volume based on a matching of the warped features with features from the second input image, and a convolutional neural network (CNN) module configured to generate a refined depth estimation based on plane hypotheses of the depth estimation and the local cost volume. 8. The real-time active stereo system of claim 7 , wherein the CNN module includes one or more residual blocks configured to apply one or more dilation convolutions. 9. A method for real-time stereo matching comprising: extracting, by a feature extractor, features from a first input image and a second input image at a plurality of resolutions including a first resolution and a second resolution; and generating, by an initialization engine, a plurality of depth estimations at the plurality of resolutions, including: predicting a first depth estimation based on a matching of the features from the first and second input images at the first resolution, the first depth estimation including a three-dimensional (3D) slanted plane hypothesis for each region of a respective depth estimation, the 3D slanted plane hypothesis including a disparity value and a location of a slanted plane; and predicting a second depth estimation based on a matching of the features from the first and second input images at the second resolution; and iteratively refining, by a propagation engine, the plurality of depth estimations based on image warping and spatial propagation, including: predicting, via a first iteration, a refined first depth estimation using the first depth estimation and the features at the first resolution; and predicting, via a second iteration, a refined second depth estimation based on the refined first depth estimation from the first iteration and the second depth estimation, the refined second depth estimation being used in a subsequent iteration or as a basis for a depth map. 10. The method of claim 9 , wherein the 3D slanted plane hypothesis includes a feature descriptor that represents information about the slanted plane. 11. The method of claim 9 , wherein the predicting the first depth estimation includes: extracting, by at least one first convolutional block, first per-region features for each image region using features of the first input image at the first resolution; extracting, by at least one second convolutional block, second per-region features for each image region using features of the second input image at the first resolution; and selecting, by a matching engine, the 3D slanted plane hypothesis for each region having a disparity value with a lowest cost. 12. The method of claim 11 , further comprising: constructing a 3D cost volume based on costs per region, wherein the 3D slanted plane hypothesis is selected based on the costs per region, wherein the 3D cost volume is not stored or used by the propagation engine. 13. The method of claim 12 , wherein the 3D slanted plane hypothesis includes a feature descriptor that describes information about a slanted plane, further comprising: generating, by a neural network, the feature descriptor based on the costs per region and at least one of the first per-region features or the second per-region features. 14. The method of claim 11 , wherein the at least one first convolutional block includes a convolutional block having a stride value that is different from a convolutional block of the at least one second convolutional block. 15. The method of claim 9 , wherein the predicting the refined first depth estimation includes: generating warped features by warping features from the first input image at the first resolution using the first depth estimation; computing a local cost volume based on a matching of the warped features with features of the second input image at the first resolution; obtaining an augmented depth estimation based on the local cost volume and the first depth estimation; and predicting, by a convolution neural network (CNN) module, the refined first depth estimation using the augmented depth estimation. 16. The method of claim 15 , wherein computing the local cost volume includes: displacing disparities in a respective region by an offset value; and computing costs for the respective region. 17. The method of claim 15 , wherein the CNN module includes a plurality of residual blocks including a first residual block and a second residual block, at least one of the first residual block or the second residual block defining one or more dilated convolutions.

Assignees

Inventors

Classifications

  • G06T7/593Primary

    from stereo images · CPC title

  • Physics · mapped topic

  • Scaling of whole images or parts thereof, e.g. expanding or contracting · CPC title

  • Erosion or dilatation, e.g. thinning · CPC title

  • Image signal generators · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11810313B2 cover?
According to an aspect, a real-time active stereo system includes a capture system configured to capture stereo data, where the stereo data includes a first input image and a second input image, and a depth sensing computing system configured to predict a depth map. The depth sensing computing system includes a feature extractor configured to extract features from the first and second images at…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06T7/593. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).