Depth estimation based on ego-motion estimation and residual flow estimation
US-2021319577-A1 · Oct 14, 2021 · US
US2022011778A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022011778-A1 |
| Application number | US-202016927270-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 13, 2020 |
| Priority date | Jul 13, 2020 |
| Publication date | Jan 13, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system includes: a depth module including an encoder and a decoder and configured to: receive a first image from a first time from a camera; and based on the first image, generate a depth map including depths between the camera and objects in the first image; a pose module configured to: generate a first pose of the camera based on the first image; generate a second pose of the camera for a second time based on a second image; and generate a third pose of the camera for a third time based on a third image; and a motion module configured to: determine a first motion of the camera between the second and first times based on the first and second poses; and determine a second motion of the camera between the second and third times based on the second and third poses.
Opening claim text (preview).
What is claimed is: 1 . A system, comprising: a depth module including an encoder and a decoder and configured to: receive a first image from a first time from a camera; and based on the first image and using the encoder and the decoder, generate a depth map including depths between the camera and objects in the first image; a pose module configured to: generate a first pose of the camera based on the first image; generate a second pose of the camera for a second time based on a second image received from the camera before the first image; and generate a third pose of the camera for a third time based on a third image received from the camera after the first image; and a motion module configured to: determine a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determine a second motion of the camera between the second time and the third time based on the second pose and the third pose. 2 . A vehicle, comprising: the system of claim 1 ; a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device based on the depth map. 3 . The vehicle of claim 2 , wherein the vehicle includes the camera and does not include any other cameras. 4 . The vehicle of claim 2 , wherein the vehicle does not include any radars, any sonar sensors, any laser sensors, or any light detection and ranging (LIDAR) sensors. 5 . A vehicle, comprising: the system of claim 1 ; a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device based on at least one of: the first motion; and the second motion. 6 . The system of claim 1 wherein the first, second, and third poses are 6 degree of freedom poses. 7 . The system of claim 1 wherein the depth module includes attention mechanisms configured to, based on the first image, generate an attention map including attention coefficients indicative of amounts of attention to attribute to the objects in the first image. 8 . The system of claim 7 wherein the attention mechanisms include attention gates. 9 . The system of claim 7 wherein the decoder includes the attention mechanisms. 10 . The system of claim 9 wherein the encoder does not include any attention mechanisms. 11 . The system of claim 9 wherein the decoder includes decoder layers and the attention mechanisms are interleaved with the decoder layers. 12 . The system of claim 7 further comprising: a first reconstruction module configured to reconstruct the second image using the attention map to produce a reconstructed second image; a second reconstruction module configured to reconstruct the third image using the attention map to produce a reconstructed third image; and a training module configured to, based on at least one of the reconstructed second image and the reconstructed third image, selectively adjust at least one parameter of at least one of depth module, the pose module, and the motion module. 13 . The system of claim 12 wherein the training module is configured to selectively adjust the at least one parameter based on the reconstructed second image, the reconstructed third image, the second image, and the third image. 14 . The system of claim 13 wherein the training module is configured to selectively adjust the at least one parameter based on: a first difference between the reconstructed second image and the second image; and a second difference between the reconstructed third image and the third image. 15 . The system of claim 12 wherein the training module is configured to jointly train the depth module, the pose module, and the motion module. 16 . The system of claim 12 wherein: the first reconstruction module is configured to reconstruct the second image using an image warping algorithm and the attention map; and the second reconstruction module is configured to reconstruct the third image using the image warping algorithm and the attention map. 17 . The system of claim 16 wherein the image warping algorithm includes an inverse image warping algorithm. 18 . The system of claim 1 wherein the pose module is configured to generate the first, second, and third poses using a PoseNet algorithm. 19 . The system of claim 1 wherein the depth module includes a DispNet encoder-decoder network. 20 . A method, comprising: receiving a first image from a first time from a camera; based on the first image, generating a depth map including depths between the camera and objects in the first image; generating a first pose of the camera based on the first image; generating a second pose of the camera for a second time based on a second image received from the camera before the first image; and generating a third pose of the camera for a third time based on a third image received from the camera after the first image; determining a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determining a second motion of the camera between the second time and the third time based on the second pose and the third pose. 21 . A system, comprising: one or more processors; and memory including code that, when executed by the one or more processors, perform functions including: receiving a first image from a first time from a camera; based on the first image, generating a depth map including depths between the camera and objects in the first image; generating a first pose of the camera based on the first image; generating a second pose of the camera for a second time based on a second image received from the camera before the first image; and generating a third pose of the camera for a third time based on a third image received from the camera after the first image; determining a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determining a second motion of the camera between the second time and the third time based on the second pose and the third pose. 22 . A system, comprising: a first means for: receiving a first image from a first time from a camera; and based on the first image, generating a depth map including depths between the camera and objects in the first image; a second means for: generating a first pose of the camera based on the first image; generating a second pose of the camera for a second time based on a second image received from the camera before the first image; and generating a third pose of the camera for a third time based on a third image received from the camera after the first image; and a third means for: determining a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determining a second motion of the camera between the second time and the third time based on the second pose and the third pose.
Combinations of networks · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Depth or shape recovery · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.