Systems and methods for depth estimation via affinity learned with convolutional spatial propagation networks
US-11361456-B2 · Jun 14, 2022 · US
US2021237774A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021237774-A1 |
| Application number | US-202017093393-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 9, 2020 |
| Priority date | Jan 31, 2020 |
| Publication date | Aug 5, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for learning depth-aware keypoints and associated descriptors from monocular video for monocular visual odometry is described. The method includes training a keypoint network and a depth network to learn depth-aware keypoints and the associated descriptors. The training is based on a target image and a context image from successive images of the monocular video. The method also includes lifting 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network. The method further includes estimating a trajectory of an ego-vehicle based on the learned 3D keypoints.
Opening claim text (preview).
What is claimed is: 1 . A method for learning depth-aware keypoints and associated descriptors from monocular video for monocular visual odometry, comprising: training a keypoint network and a depth network to learn depth-aware keypoints and the associated descriptors based on a target image and a context image from successive images of the monocular video; lifting 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network; and estimating a trajectory of an ego-vehicle based on the learned 3D keypoints. 2 . The method of claim 1 , in which training comprises self-supervised learning of the depth-aware keypoints based on the target image and the context image from successive images of the monocular video without any additional source of information. 3 . The method of claim 1 , in which training comprises training a differentiable pose estimation module based on sparse keypoint data to enable simultaneous training of the depth network and the keypoint network. 4 . The method of claim 1 , in which lifting comprises sampling from a predicted depth map of the depth network to lift sparse 2D keypoints to 3D keypoints. 5 . The method of claim 1 , in which estimating comprises estimating a pose transformation from the target image to the context image based on the learned 3D keypoints to geometrically match depth-ware keypoints in the target image to the depth-aware keypoints in the context image. 6 . The method of claim 1 , in which relative poses of successive images of the monocular video and the depth-aware keypoints are matched based on nearest neighbor matching using the associated descriptors with a reciprocal check. 7 . The method of claim 1 , further comprising: determining the keypoints from the target image; and computing warped keypoints in the context image corresponding to the determined keypoints from the target image according to a nearest keypoint in the target image. 8 . The method of claim 1 , further comprising estimating the monocular visual odometry of the ego-vehicle according to the estimated trajectory of the ego-vehicle. 9 . The method of claim 1 , further comprising using the learned 3D keypoints and descriptors to perform monocular, scale-aware, long-range visual odometry. 10 . A non-transitory computer-readable medium having program code recorded thereon for learning depth-aware keypoints and associated descriptors from monocular video for monocular visual odometry, the program code being executed by a processor and comprising: program code to train a keypoint network and a depth network to learn the depth-aware keypoints and the associated descriptors based on a target image and a context image from successive images of the monocular video; program code to lift 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network; and program code to estimate a trajectory of an ego-vehicle based on the learned 3D keypoints. 11 . The non-transitory computer-readable medium of claim 10 , in which the program code to train comprises program code to self-supervised learning of the depth-aware keypoints based on the target image and the context image from successive images of the monocular video without any additional source of information. 12 . The non-transitory computer-readable medium of claim 10 , in which the program code to train comprises program code to train a differentiable pose estimation module based on sparse keypoint data to enable simultaneous training of the depth network and the keypoint network. 13 . The non-transitory computer-readable medium of claim 10 , in which the program code to lift comprises program code to sample from a predicted depth map of the depth network to lift sparse 2D keypoints to 3D keypoints. 14 . The non-transitory computer-readable medium of claim 10 , in which the program code to estimate comprises program code to estimate a pose transformation from the target image to the context image based on the learned 3D keypoints to geometrically match depth-ware keypoints in the target image to the depth-aware keypoints in the context image. 15 . The non-transitory computer-readable medium of claim 10 , in which relative poses of successive images of the monocular video and the depth-aware keypoints are matched based on nearest neighbor matching using the associated descriptors with a reciprocal check. 16 . The non-transitory computer-readable medium of claim 10 , further comprising: program code to determine the keypoints from the target image; and program code to compute warped keypoints in the context image corresponding to the determined keypoints from the target image according to a nearest keypoint in the target image. 17 . The non-transitory computer-readable medium of claim 10 , further comprising program code to use the learned 3D keypoints and descriptors to perform monocular, scale-aware, long-range visual odometry. 18 . A system for learning depth-aware keypoints and associated descriptors from monocular video for ego-motion estimation, the system comprising: a depth-aware keypoint model trained to learn a keypoint network and a depth network to learn the depth-aware keypoints and the associated descriptors based on a target image and a context image from successive images of the monocular video; a keypoint lifting module to lift 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network; and a monocular visual odometry module to estimate a trajectory of an ego-vehicle based on the learned 3D keypoints. 19 . The system of claim 18 , further comprising a pose estimation module to provide differentiable pose estimation based on sparse keypoint data to enable simultaneous training of the depth network and the keypoint network. 20 . The system of claim 18 , in which the monocular visual odometry module is trained to estimate ego-motion from the target image to the context image based on the learned 3D keypoints.
from motion · CPC title
Three-dimensional [3D] objects · CPC title
Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.