What technology area does this patent fall under?

Primary CPC classification G06T7/579. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Densifying sparse depth maps

US11238604B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11238604-B1
Application number	US-202016789788-A
Country	US
Kind code	B1
Filing date	Feb 13, 2020
Priority date	Mar 5, 2019
Publication date	Feb 1, 2022
Grant date	Feb 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and techniques that use one or more machine learning models to predict a dense depth map (e.g., of depth values for all pixels or at least more pixels than a sparse estimation source (e.g., SLAM)). In some implementations, the machine learning model includes two sub models (e.g., neural networks). The first machine learning model predicts computer vision data such as semantic labels and surface normal directions from an input image. This computer vision data will be used to add to or otherwise improve sparse depth data. Specifically, a second machine learning model takes the semantic labels and surface normal directions from and sparse depth data (e.g., 3D points) from a sparse point estimation source (e.g., SLAM) as inputs and outputs a depth map. The output depth map effectively densities the initial depth data (e.g., from SLAM) by providing depth data for additional pixels of the image.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: at an electronic device having a processor: obtaining an image of a physical setting from an image capture device; producing a semantic prediction and a surface normal prediction using a first machine learning model with input comprising the image; producing depth data using a depth estimator on the image; and outputting a depth map using a second machine learning model with input comprising the semantic prediction, the surface normal prediction, and the depth data. 2. The method of claim 1 , wherein the depth map is denser than the depth data, the depth map providing depth estimates for more portions of the image than the depth data. 3. The method of claim 1 , wherein the depth map is denser than the depth data, the depth map providing depth estimates for more pixels of the image than the depth data. 4. The method of claim 1 , wherein the depth map provides a depth estimate for each pixel of the image. 5. The method of claim 1 , wherein the depth estimator is a simultaneous localization and mapping (SLAM) technique. 6. The method of claim 5 , wherein the depth data is determined by projecting three dimensional (3D) points determined by the SLAM technique based on: a pose of the image capture device during capture of the image; and focal length or distortion parameters of the image capture device. 7. The method of claim 1 , wherein the first machine learning model comprises multiple sub-models, the sub-models comprising a semantic prediction model and a surface normal prediction model. 8. The method of claim 1 , wherein outputting the depth map comprises displaying the depth map. 9. The method of claim 1 , wherein the second model is trained by: obtaining sample images from the image capture device; producing training depth maps using a structure-from-motion (SFM) technique on the sample images; producing semantic predictions and surface normal predictions using the first machine learning model with input comprising the sample images; producing depth data sets using the depth estimator on the sample images; outputting depth maps using the second machine learning model with input comprising the semantic predictions, the surface normal predictions, and the depth data sets; and adjusting the second machine learning model based on comparing the training depth maps and the output depth maps. 10. The method of claim 9 , wherein the comparing excludes pixels that are not predicted by the SFM technique. 11. A system comprising: a non-transitory computer-readable storage medium; and a processor coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the processor, cause the system to perform operations comprising: obtaining an image of a physical setting from an image capture device; producing a semantic prediction and a surface normal prediction using a first machine learning model with input comprising the image; producing depth data using a depth estimator on the image; and outputting a depth map using a second machine learning model with input comprising the semantic prediction, the surface normal prediction, and the depth data. 12. The system of claim 11 , wherein the depth map provides depth estimates for more pixels of the image than the depth data. 13. The system of claim 11 , wherein the depth map provides a depth estimate for each pixel of the image. 14. The system of claim 11 , wherein the depth estimator is a simultaneous localization and mapping (SLAM) technique. 15. The system of claim 14 , wherein the depth data is determined by projecting three dimensional (3D) points determined by the SLAM technique based on: a pose of the image capture device during capture of the image; and focal length or distortion parameters of the image capture device. 16. The system of claim 11 , wherein the second model is trained by: obtaining sample images from the image capture device; producing training depth maps using a structure-from-motion (SFM) technique on the sample images; producing semantic predictions and surface normal predictions using the first machine learning model with input comprising the sample images; producing depth data sets using the depth estimator on the sample images; outputting depth maps using the second machine learning model with input comprising the semantic predictions, the surface normal predictions, and the depth data sets; and adjusting the second machine learning model based on comparing the training depth maps and the output depth maps, wherein the comparing excludes pixels that are not predicted by the SFM technique. 17. A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising: obtaining an image of a physical setting from an image capture device; producing a semantic prediction and a surface normal prediction using a first machine learning model with input comprising the image; producing depth data using a depth estimator on the image; and outputting a depth map using a second machine learning model with input comprising the semantic prediction, the surface normal prediction, and the depth data. 18. The non-transitory computer-readable storage medium of claim 17 , wherein the depth map provides depth estimates for more pixels of the image than the depth data. 19. The non-transitory computer-readable storage medium of claim 17 , wherein the depth data is determined by projecting three dimensional (3D) points determined by a SLAM technique based on: a pose of the image capture device during capture of the image; and focal length or distortion parameters of the image capture device. 20. The non-transitory computer-readable storage medium of claim 17 , wherein the second model is trained by: obtaining sample images from the image capture device; producing training depth maps using a structure-from-motion (SFM) technique on the sample images; producing semantic predictions and surface normal predictions using the first machine learning model with input comprising the sample images; producing depth data sets using the depth estimator on the sample images; outputting depth maps using the second machine learning model with input comprising the semantic predictions, the surface normal predictions, and the depth data sets; and adjusting the second machine learning model based on comparing the training depth maps and the output depth maps, wherein the comparing excludes pixels that are not predicted by the SFM technique.

Assignees

Apple Inc

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06V20/70
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
G06V20/647
by matching two-dimensional images to three-dimensional objects · CPC title
G06V10/82
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 80034695

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11238604B1 cover?: A system and techniques that use one or more machine learning models to predict a dense depth map (e.g., of depth values for all pixels or at least more pixels than a sparse estimation source (e.g., SLAM)). In some implementations, the machine learning model includes two sub models (e.g., neural networks). The first machine learning model predicts computer vision data such as semantic labels an…
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G06T7/579. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Image signal processor for generating depth map from phase detection pixels and device having the same

Two-dimensional infrared depth sensing

Roadside object detection apparatus

Method and apparatus for real time motion capture

System and Method for Determining a Depth Map Sequence for a Two-Dimensional Video Sequence

Frequently asked questions