Stereo depth estimation
US-12169943-B2 · Dec 17, 2024 · US
US2021352262A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021352262-A1 |
| Application number | US-202016872181-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 11, 2020 |
| Priority date | May 11, 2020 |
| Publication date | Nov 11, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
AR elements are occluded in video image frames. A depth map is determined for an image frame of a video received from a video capture device. An AR graphical element for overlaying over the image frame is received. An element distance for AR graphical elements relative to a position of a user of the video capture device (e.g., the geographic position of the video capture device) is also received. Based on the depth map for the image frame, a pixel distance is determined for each pixel in the image frame. The pixel distances of the pixels in the image frame are compared to the element distance, and in response to a pixel distance for a given pixel being less than the element distance, the pixel of the image frame is displayed rather than a corresponding pixel of the AR graphical element.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: receiving a video comprising image frames; computing a depth map for an image frame of the video, the depth map including a depth value for each pixel in the image frame; identifying an augmented reality (AR) graphical element for overlaying on the video; determining an element distance of the AR graphical element relative to a user's position; determining a pixel distance for each pixel in the image frame based on the depth map corresponding to the image frame; comparing a pixel distance of a pixel of the image frame of the video to the element distance; and responsive to the pixel distance of the pixel of the image frame being less than the element distance, displaying the pixel of the image frame rather than a corresponding pixel of the AR graphical element, such that the AR graphical element is at least partially occluded by one or more pixels of the image frame. 2 . The computer-implemented method of claim 1 , wherein computing the depth map for the image frame comprises inputting the image frame to a trained depth estimation model to generate the depth map, and wherein the depth value for each pixel in the image frame is a relative depth value. 3 . The computer-implemented method of claim 2 , wherein training the trained depth estimation model comprises: receiving a set of training image frames of one or more training videos; and determining a set of model parameters mapping the set of training image frames to corresponding depth maps. 4 . The computer-implemented method of claim 2 , wherein the trained depth estimation model is a self-supervised monocular depth estimation model. 5 . The computer-implemented method of claim 1 , wherein determining a pixel distance of each pixel in each image frame of the video comprises: for each image frame: determining a reference distance from a video capture device that captured the video to a portion of a scene captured by a reference pixel in the image frame; calculating a conversion factor for the depth map based on the determined reference distance and a reference depth value of the depth map corresponding to the reference pixel; and calculating the distance of each pixel in the image frame based on a depth value of the pixel in the image frame and the conversion factor. 6 . The computer-implemented method of claim 5 , wherein the determining the reference distance for each image frame is based on a calibration for the image frame, the calibration based on a geographic location of the video capture device and an angular orientation of the video capture device relative to a ground plane corresponding to the geographic location during capture of the image frame. 7 . The computer-implemented method of claim 5 , wherein the determining the reference distance for each image frame comprises: detecting an object or landmark captured in the image frame represented by the reference pixel; and calculating the reference distance based on a geographic location of the object or landmark and a geographic location of the video capture device during capture of the image frame. 8 . The computer-implemented method of claim 7 , wherein detecting the object or landmark comprises dynamic object detection of a moving object. 9 . The computer-implemented method of claim 1 , wherein computing the depth map for the image frame is based on a depth map generated by one of a stereo camera system, a system using more than one camera, a light detection and ranging (LIDAR) system, and a monocular depth mapping system using a single camera. 10 . The computer-implemented method of claim 1 , wherein determining the pixel distance further comprises: segmenting portions of the image frame; smoothing the depth map corresponding to the image frame based on the segmented portions of the image frame; and determining the pixel distance based on the smoothed depth map. 11 . The computer-implemented method of claim 10 , wherein the pixel distance is further determined based on the segmented portions of the image frame. 12 . The computer-implemented method of claim 10 , wherein segmenting portions of the image frame comprises inputting the image frame to an image segmentation model to generate one or more output groups of pixels and at least one output label for each of the one or more groups of pixels. 13 . The computer-implemented method of claim 1 , wherein the video is displayed with the AR graphical element overlaid on portions of the video on a display of a navigation system. 14 . The computer-implemented method of claim 13 , wherein the display of the navigation system comprises a display of a mobile computing device. 15 . A non-transitory computer readable storage medium storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform steps comprising: receiving a video comprising image frames; computing a depth map for an image frame of the video, the depth map including a depth value for each pixel in the image frame; identifying an augmented reality (AR) graphical element for overlaying on the video; determining an element distance of the AR graphical element relative to a user's position; determining a pixel distance for each pixel in the image frame based on the depth map corresponding to the image frame; comparing a pixel distance of a pixel of the image frame of the video to the element distance; and responsive to the pixel distance of the pixel of the image frame being less than the element distance, displaying the pixel of the image frame rather than a corresponding pixel of the AR graphical element, such that the AR graphical element is at least partially occluded by one or more pixels of the image frame. 16 . The computer-readable storage medium of claim 15 , wherein computing the depth map for the image frame comprises inputting the image frame to a trained depth estimation model to generate the depth map, and the depth value for each pixel in the image frame is a relative depth value. 17 . The computer-readable storage medium of claim 15 , wherein the trained depth estimation model is a self-supervised monocular depth estimation model. 18 . The computer-readable storage medium of claim 15 , wherein determining a pixel distance of each pixel in each image frame of the video comprises: for each image frame: determining a reference distance from a video capture device that captured the video to a portion of a scene captured by a reference pixel in the image frame; calculating a conversion factor for the depth map based on the determined reference distance and a reference depth value of the depth map corresponding to the reference pixel; and calculating the distance of each pixel in the image frame based on a depth value of the pixel in the image frame and the conversion factor. 19 . The computer-readable storage medium of claim 18 , wherein the determining the reference distance for each image frame is based on a calibration for the image frame, the calibration based on a geographic location of the video capture device and an angular orientation of the video capture device relative to a ground plane corresponding to the geographic location during capture of the image frame. 20 . The computer-readable storage medium of claim 18 , wherein the determining the reference distance for each image frame comprises: detecting an object or landmark captured in the image frame represented by the reference pi
Combinations of networks · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Supervised learning · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.