Rendering augmented reality with occlusion

US2021352262A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021352262-A1
Application numberUS-202016872181-A
CountryUS
Kind codeA1
Filing dateMay 11, 2020
Priority dateMay 11, 2020
Publication dateNov 11, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

AR elements are occluded in video image frames. A depth map is determined for an image frame of a video received from a video capture device. An AR graphical element for overlaying over the image frame is received. An element distance for AR graphical elements relative to a position of a user of the video capture device (e.g., the geographic position of the video capture device) is also received. Based on the depth map for the image frame, a pixel distance is determined for each pixel in the image frame. The pixel distances of the pixels in the image frame are compared to the element distance, and in response to a pixel distance for a given pixel being less than the element distance, the pixel of the image frame is displayed rather than a corresponding pixel of the AR graphical element.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: receiving a video comprising image frames; computing a depth map for an image frame of the video, the depth map including a depth value for each pixel in the image frame; identifying an augmented reality (AR) graphical element for overlaying on the video; determining an element distance of the AR graphical element relative to a user's position; determining a pixel distance for each pixel in the image frame based on the depth map corresponding to the image frame; comparing a pixel distance of a pixel of the image frame of the video to the element distance; and responsive to the pixel distance of the pixel of the image frame being less than the element distance, displaying the pixel of the image frame rather than a corresponding pixel of the AR graphical element, such that the AR graphical element is at least partially occluded by one or more pixels of the image frame. 2 . The computer-implemented method of claim 1 , wherein computing the depth map for the image frame comprises inputting the image frame to a trained depth estimation model to generate the depth map, and wherein the depth value for each pixel in the image frame is a relative depth value. 3 . The computer-implemented method of claim 2 , wherein training the trained depth estimation model comprises: receiving a set of training image frames of one or more training videos; and determining a set of model parameters mapping the set of training image frames to corresponding depth maps. 4 . The computer-implemented method of claim 2 , wherein the trained depth estimation model is a self-supervised monocular depth estimation model. 5 . The computer-implemented method of claim 1 , wherein determining a pixel distance of each pixel in each image frame of the video comprises: for each image frame: determining a reference distance from a video capture device that captured the video to a portion of a scene captured by a reference pixel in the image frame; calculating a conversion factor for the depth map based on the determined reference distance and a reference depth value of the depth map corresponding to the reference pixel; and calculating the distance of each pixel in the image frame based on a depth value of the pixel in the image frame and the conversion factor. 6 . The computer-implemented method of claim 5 , wherein the determining the reference distance for each image frame is based on a calibration for the image frame, the calibration based on a geographic location of the video capture device and an angular orientation of the video capture device relative to a ground plane corresponding to the geographic location during capture of the image frame. 7 . The computer-implemented method of claim 5 , wherein the determining the reference distance for each image frame comprises: detecting an object or landmark captured in the image frame represented by the reference pixel; and calculating the reference distance based on a geographic location of the object or landmark and a geographic location of the video capture device during capture of the image frame. 8 . The computer-implemented method of claim 7 , wherein detecting the object or landmark comprises dynamic object detection of a moving object. 9 . The computer-implemented method of claim 1 , wherein computing the depth map for the image frame is based on a depth map generated by one of a stereo camera system, a system using more than one camera, a light detection and ranging (LIDAR) system, and a monocular depth mapping system using a single camera. 10 . The computer-implemented method of claim 1 , wherein determining the pixel distance further comprises: segmenting portions of the image frame; smoothing the depth map corresponding to the image frame based on the segmented portions of the image frame; and determining the pixel distance based on the smoothed depth map. 11 . The computer-implemented method of claim 10 , wherein the pixel distance is further determined based on the segmented portions of the image frame. 12 . The computer-implemented method of claim 10 , wherein segmenting portions of the image frame comprises inputting the image frame to an image segmentation model to generate one or more output groups of pixels and at least one output label for each of the one or more groups of pixels. 13 . The computer-implemented method of claim 1 , wherein the video is displayed with the AR graphical element overlaid on portions of the video on a display of a navigation system. 14 . The computer-implemented method of claim 13 , wherein the display of the navigation system comprises a display of a mobile computing device. 15 . A non-transitory computer readable storage medium storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform steps comprising: receiving a video comprising image frames; computing a depth map for an image frame of the video, the depth map including a depth value for each pixel in the image frame; identifying an augmented reality (AR) graphical element for overlaying on the video; determining an element distance of the AR graphical element relative to a user's position; determining a pixel distance for each pixel in the image frame based on the depth map corresponding to the image frame; comparing a pixel distance of a pixel of the image frame of the video to the element distance; and responsive to the pixel distance of the pixel of the image frame being less than the element distance, displaying the pixel of the image frame rather than a corresponding pixel of the AR graphical element, such that the AR graphical element is at least partially occluded by one or more pixels of the image frame. 16 . The computer-readable storage medium of claim 15 , wherein computing the depth map for the image frame comprises inputting the image frame to a trained depth estimation model to generate the depth map, and the depth value for each pixel in the image frame is a relative depth value. 17 . The computer-readable storage medium of claim 15 , wherein the trained depth estimation model is a self-supervised monocular depth estimation model. 18 . The computer-readable storage medium of claim 15 , wherein determining a pixel distance of each pixel in each image frame of the video comprises: for each image frame: determining a reference distance from a video capture device that captured the video to a portion of a scene captured by a reference pixel in the image frame; calculating a conversion factor for the depth map based on the determined reference distance and a reference depth value of the depth map corresponding to the reference pixel; and calculating the distance of each pixel in the image frame based on a depth value of the pixel in the image frame and the conversion factor. 19 . The computer-readable storage medium of claim 18 , wherein the determining the reference distance for each image frame is based on a calibration for the image frame, the calibration based on a geographic location of the video capture device and an angular orientation of the video capture device relative to a ground plane corresponding to the geographic location during capture of the image frame. 20 . The computer-readable storage medium of claim 18 , wherein the determining the reference distance for each image frame comprises: detecting an object or landmark captured in the image frame represented by the reference pi

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Supervised learning · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021352262A1 cover?
AR elements are occluded in video image frames. A depth map is determined for an image frame of a video received from a video capture device. An AR graphical element for overlaying over the image frame is received. An element distance for AR graphical elements relative to a position of a user of the video capture device (e.g., the geographic position of the video capture device) is also receive…
Who is the assignee on this patent?
Mapbox Inc
What technology area does this patent fall under?
Primary CPC classification H04N13/128. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Nov 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).