Self-Supervised Attention Learning For Depth And Motion Estimation

US2022011778A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022011778-A1
Application numberUS-202016927270-A
CountryUS
Kind codeA1
Filing dateJul 13, 2020
Priority dateJul 13, 2020
Publication dateJan 13, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system includes: a depth module including an encoder and a decoder and configured to: receive a first image from a first time from a camera; and based on the first image, generate a depth map including depths between the camera and objects in the first image; a pose module configured to: generate a first pose of the camera based on the first image; generate a second pose of the camera for a second time based on a second image; and generate a third pose of the camera for a third time based on a third image; and a motion module configured to: determine a first motion of the camera between the second and first times based on the first and second poses; and determine a second motion of the camera between the second and third times based on the second and third poses.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system, comprising: a depth module including an encoder and a decoder and configured to: receive a first image from a first time from a camera; and based on the first image and using the encoder and the decoder, generate a depth map including depths between the camera and objects in the first image; a pose module configured to: generate a first pose of the camera based on the first image; generate a second pose of the camera for a second time based on a second image received from the camera before the first image; and generate a third pose of the camera for a third time based on a third image received from the camera after the first image; and a motion module configured to: determine a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determine a second motion of the camera between the second time and the third time based on the second pose and the third pose. 2 . A vehicle, comprising: the system of claim 1 ; a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device based on the depth map. 3 . The vehicle of claim 2 , wherein the vehicle includes the camera and does not include any other cameras. 4 . The vehicle of claim 2 , wherein the vehicle does not include any radars, any sonar sensors, any laser sensors, or any light detection and ranging (LIDAR) sensors. 5 . A vehicle, comprising: the system of claim 1 ; a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device based on at least one of: the first motion; and the second motion. 6 . The system of claim 1 wherein the first, second, and third poses are 6 degree of freedom poses. 7 . The system of claim 1 wherein the depth module includes attention mechanisms configured to, based on the first image, generate an attention map including attention coefficients indicative of amounts of attention to attribute to the objects in the first image. 8 . The system of claim 7 wherein the attention mechanisms include attention gates. 9 . The system of claim 7 wherein the decoder includes the attention mechanisms. 10 . The system of claim 9 wherein the encoder does not include any attention mechanisms. 11 . The system of claim 9 wherein the decoder includes decoder layers and the attention mechanisms are interleaved with the decoder layers. 12 . The system of claim 7 further comprising: a first reconstruction module configured to reconstruct the second image using the attention map to produce a reconstructed second image; a second reconstruction module configured to reconstruct the third image using the attention map to produce a reconstructed third image; and a training module configured to, based on at least one of the reconstructed second image and the reconstructed third image, selectively adjust at least one parameter of at least one of depth module, the pose module, and the motion module. 13 . The system of claim 12 wherein the training module is configured to selectively adjust the at least one parameter based on the reconstructed second image, the reconstructed third image, the second image, and the third image. 14 . The system of claim 13 wherein the training module is configured to selectively adjust the at least one parameter based on: a first difference between the reconstructed second image and the second image; and a second difference between the reconstructed third image and the third image. 15 . The system of claim 12 wherein the training module is configured to jointly train the depth module, the pose module, and the motion module. 16 . The system of claim 12 wherein: the first reconstruction module is configured to reconstruct the second image using an image warping algorithm and the attention map; and the second reconstruction module is configured to reconstruct the third image using the image warping algorithm and the attention map. 17 . The system of claim 16 wherein the image warping algorithm includes an inverse image warping algorithm. 18 . The system of claim 1 wherein the pose module is configured to generate the first, second, and third poses using a PoseNet algorithm. 19 . The system of claim 1 wherein the depth module includes a DispNet encoder-decoder network. 20 . A method, comprising: receiving a first image from a first time from a camera; based on the first image, generating a depth map including depths between the camera and objects in the first image; generating a first pose of the camera based on the first image; generating a second pose of the camera for a second time based on a second image received from the camera before the first image; and generating a third pose of the camera for a third time based on a third image received from the camera after the first image; determining a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determining a second motion of the camera between the second time and the third time based on the second pose and the third pose. 21 . A system, comprising: one or more processors; and memory including code that, when executed by the one or more processors, perform functions including: receiving a first image from a first time from a camera; based on the first image, generating a depth map including depths between the camera and objects in the first image; generating a first pose of the camera based on the first image; generating a second pose of the camera for a second time based on a second image received from the camera before the first image; and generating a third pose of the camera for a third time based on a third image received from the camera after the first image; determining a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determining a second motion of the camera between the second time and the third time based on the second pose and the third pose. 22 . A system, comprising: a first means for: receiving a first image from a first time from a camera; and based on the first image, generating a depth map including depths between the camera and objects in the first image; a second means for: generating a first pose of the camera based on the first image; generating a second pose of the camera for a second time based on a second image received from the camera before the first image; and generating a third pose of the camera for a third time based on a third image received from the camera after the first image; and a third means for: determining a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determining a second motion of the camera between the second time and the third time based on the second pose and the third pose.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Depth or shape recovery · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022011778A1 cover?
A system includes: a depth module including an encoder and a decoder and configured to: receive a first image from a first time from a camera; and based on the first image, generate a depth map including depths between the camera and objects in the first image; a pose module configured to: generate a first pose of the camera based on the first image; generate a second pose of the camera for a s…
Who is the assignee on this patent?
Naver Corp, Naver Labs Corp
What technology area does this patent fall under?
Primary CPC classification G06T7/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).