Self-supervised 3d keypoint learning for monocular visual odometry

US2021237774A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021237774-A1
Application numberUS-202017093393-A
CountryUS
Kind codeA1
Filing dateNov 9, 2020
Priority dateJan 31, 2020
Publication dateAug 5, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for learning depth-aware keypoints and associated descriptors from monocular video for monocular visual odometry is described. The method includes training a keypoint network and a depth network to learn depth-aware keypoints and the associated descriptors. The training is based on a target image and a context image from successive images of the monocular video. The method also includes lifting 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network. The method further includes estimating a trajectory of an ego-vehicle based on the learned 3D keypoints.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for learning depth-aware keypoints and associated descriptors from monocular video for monocular visual odometry, comprising: training a keypoint network and a depth network to learn depth-aware keypoints and the associated descriptors based on a target image and a context image from successive images of the monocular video; lifting 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network; and estimating a trajectory of an ego-vehicle based on the learned 3D keypoints. 2 . The method of claim 1 , in which training comprises self-supervised learning of the depth-aware keypoints based on the target image and the context image from successive images of the monocular video without any additional source of information. 3 . The method of claim 1 , in which training comprises training a differentiable pose estimation module based on sparse keypoint data to enable simultaneous training of the depth network and the keypoint network. 4 . The method of claim 1 , in which lifting comprises sampling from a predicted depth map of the depth network to lift sparse 2D keypoints to 3D keypoints. 5 . The method of claim 1 , in which estimating comprises estimating a pose transformation from the target image to the context image based on the learned 3D keypoints to geometrically match depth-ware keypoints in the target image to the depth-aware keypoints in the context image. 6 . The method of claim 1 , in which relative poses of successive images of the monocular video and the depth-aware keypoints are matched based on nearest neighbor matching using the associated descriptors with a reciprocal check. 7 . The method of claim 1 , further comprising: determining the keypoints from the target image; and computing warped keypoints in the context image corresponding to the determined keypoints from the target image according to a nearest keypoint in the target image. 8 . The method of claim 1 , further comprising estimating the monocular visual odometry of the ego-vehicle according to the estimated trajectory of the ego-vehicle. 9 . The method of claim 1 , further comprising using the learned 3D keypoints and descriptors to perform monocular, scale-aware, long-range visual odometry. 10 . A non-transitory computer-readable medium having program code recorded thereon for learning depth-aware keypoints and associated descriptors from monocular video for monocular visual odometry, the program code being executed by a processor and comprising: program code to train a keypoint network and a depth network to learn the depth-aware keypoints and the associated descriptors based on a target image and a context image from successive images of the monocular video; program code to lift 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network; and program code to estimate a trajectory of an ego-vehicle based on the learned 3D keypoints. 11 . The non-transitory computer-readable medium of claim 10 , in which the program code to train comprises program code to self-supervised learning of the depth-aware keypoints based on the target image and the context image from successive images of the monocular video without any additional source of information. 12 . The non-transitory computer-readable medium of claim 10 , in which the program code to train comprises program code to train a differentiable pose estimation module based on sparse keypoint data to enable simultaneous training of the depth network and the keypoint network. 13 . The non-transitory computer-readable medium of claim 10 , in which the program code to lift comprises program code to sample from a predicted depth map of the depth network to lift sparse 2D keypoints to 3D keypoints. 14 . The non-transitory computer-readable medium of claim 10 , in which the program code to estimate comprises program code to estimate a pose transformation from the target image to the context image based on the learned 3D keypoints to geometrically match depth-ware keypoints in the target image to the depth-aware keypoints in the context image. 15 . The non-transitory computer-readable medium of claim 10 , in which relative poses of successive images of the monocular video and the depth-aware keypoints are matched based on nearest neighbor matching using the associated descriptors with a reciprocal check. 16 . The non-transitory computer-readable medium of claim 10 , further comprising: program code to determine the keypoints from the target image; and program code to compute warped keypoints in the context image corresponding to the determined keypoints from the target image according to a nearest keypoint in the target image. 17 . The non-transitory computer-readable medium of claim 10 , further comprising program code to use the learned 3D keypoints and descriptors to perform monocular, scale-aware, long-range visual odometry. 18 . A system for learning depth-aware keypoints and associated descriptors from monocular video for ego-motion estimation, the system comprising: a depth-aware keypoint model trained to learn a keypoint network and a depth network to learn the depth-aware keypoints and the associated descriptors based on a target image and a context image from successive images of the monocular video; a keypoint lifting module to lift 2D keypoints from the target image to learn 3D keypoints based on a learned depth map from the depth network; and a monocular visual odometry module to estimate a trajectory of an ego-vehicle based on the learned 3D keypoints. 19 . The system of claim 18 , further comprising a pose estimation module to provide differentiable pose estimation based on sparse keypoint data to enable simultaneous training of the depth network and the keypoint network. 20 . The system of claim 18 , in which the monocular visual odometry module is trained to estimate ego-motion from the target image to the context image based on the learned 3D keypoints.

Assignees

Inventors

Classifications

  • G06T7/579Primary

    from motion · CPC title

  • Three-dimensional [3D] objects · CPC title

  • Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021237774A1 cover?
A method for learning depth-aware keypoints and associated descriptors from monocular video for monocular visual odometry is described. The method includes training a keypoint network and a depth network to learn depth-aware keypoints and the associated descriptors. The training is based on a target image and a context image from successive images of the monocular video. The method also include…
Who is the assignee on this patent?
Toyota Res Inst Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/579. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 05 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).