Systems and methods for semi-supervised depth estimation according to an arbitrary camera

US11436743B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11436743-B2
Application numberUS-202016906801-A
CountryUS
Kind codeB2
Filing dateJun 19, 2020
Priority dateJul 6, 2019
Publication dateSep 6, 2022
Grant dateSep 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

System, methods, and other embodiments described herein relate to semi-supervised training of a depth model using a neural camera model that is independent of a camera type. In one embodiment, a method includes acquiring training data including at least a pair of training images and depth data associated with the training images. The method includes training the depth model using the training data to generate a self-supervised loss from the pair of training images and a supervised loss from the depth data. Training the depth model includes learning the camera type by generating, using a ray surface model, a ray surface that approximates an image character of the training images as produced by a camera having the camera type. The method includes providing the depth model to infer depths from monocular images in a device.

First claim

Opening claim text (preview).

What is claimed is: 1. A depth system for semi-supervised training of a depth model using a neural camera model that is independent of a camera type, comprising: one or more processors; a memory communicably coupled to the one or more processors and storing: a network module including instructions that, when executed by the one or more processors, cause the one or more processors to acquire training data including at least a pair of training images derived from a monocular video and depth data associated with at least one of the training images; and a training module including instructions that, when executed by the one or more processors, cause the one or more processors to train the depth model using the training data to generate a self-supervised loss from the pair of training images and a supervised loss from the depth data, wherein the training module includes instructions to train the depth model including instructions to learn the camera type by generating, using a ray surface model that is a neural network, a ray surface that approximates an image character of the training images as produced by a camera having the camera type and to create a synthesized image according to at least the ray surface and a depth map by i) lifting pixels to produce three-dimensional points using the ray surface, the depth map, and a camera center point and ii) projecting the three-dimensional points onto a context image to create the synthesized image, and wherein the training module includes instructions to provide the depth model to infer depths from monocular images in a device. 2. The depth system of claim 1 , wherein the training module includes instructions to train the depth model including instructions to generate the supervised loss using a supervised loss function that compares values between the depth map and corresponding information from the depth data that is sparse Light Detection and Ranging (LiDAR) data, and wherein the training module includes instructions to generate the supervised loss using the sparse LiDAR data to train the depth model on metric scale by accounting for scale aware differences between the depth map and the sparse LiDAR data and learns a camera center point associated with the ray surface. 3. The depth system of claim 1 , wherein the training module includes instructions to lift the pixels including instructions to scale predicted ray vectors from the ray surface using the depth map and adjust the predicted ray vectors according to the camera center point, and wherein the training module includes instructions to project including instructions to apply a softmax approximation to derive each pixel in the synthesized image by identifying a predicted ray vector from the ray surface that corresponds with a direction associated with the three-dimensional points as defined relative to the camera center point. 4. The depth system of claim 1 , wherein the image character associated with the camera type includes at least a format of the monocular images and lens distortion, and wherein the ray surface associates pixels within the monocular images with directions in an environment from which light that generates the pixels in the camera originates. 5. The depth system of claim 1 , wherein the training module includes instructions to train the depth model to produce depth estimates according to a semi-supervised training process that integrates a self-supervised structure from motion (SfM) process, and wherein the training module includes instructions to train the depth model including instructions to use the self-supervised loss and the supervised loss to update at least the depth model, a pose model, and the ray surface model. 6. The depth system of claim 1 , wherein the self-supervised loss includes a photometric loss and a depth smoothness loss that separately account for pixel-level similarities, wherein the training module includes instructions to train the depth model including instructions to pre-train the depth model, the ray surface model, and a pose model according to a self-supervised training process that does not use the depth data as a ground truth comparison. 7. The depth system of claim 1 , wherein the depth model is a machine learning algorithm, and wherein generating the ray surface includes learning the camera type to provide the ray surface as part of the neural camera model that approximates the camera type for a set of training data including the pair of training images. 8. A non-transitory computer-readable medium for semi-supervised training of a depth model using a neural camera model that is independent of a camera type and including instructions that when executed by one or more processors cause the one or more processors to: acquire training data including at least a pair of training images derived from a monocular video and depth data associated with at least one of the training images; train the depth model using the training data to generate a self-supervised loss from the pair of training images and a supervised loss from the depth data, wherein training the depth model includes learning the camera type by generating, using a ray surface model that is a neural network, a ray surface that approximates an image character of the training images as produced by a camera having the camera type and to create a synthesized image according to at least the ray surface and a depth map by i) lifting pixels to produce three-dimensional points using the ray surface, the depth map, and a camera center point and ii) projecting the three-dimensional points onto a context image to create the synthesized image; and provide the depth model to infer depths from monocular images in a device. 9. The non-transitory computer-readable medium of claim 8 , wherein the instructions to train the depth model include instructions to generate the supervised loss using a supervised loss function that compares values between the depth map and corresponding information from the depth data that is sparse Light Detection and Ranging (LiDAR) data, wherein the instructions to generate the supervised loss use the sparse LiDAR data to train the depth model on metric scale by accounting for scale aware differences between the depth map and the sparse LiDAR data and learns a camera center point associated with the ray surface. 10. The non-transitory computer-readable medium of claim 8 , wherein the instructions to train the depth model produce depth estimates according to a self-supervised structure from motion (SfM) process, and wherein the instructions to train the depth model include instructions to use the self-supervised loss and the supervised loss to update at least the depth model, a pose model, and the ray surface model. 11. A method of semi-supervised training of a depth model using a neural camera model that is independent of a camera type, comprising: acquiring training data including at least a pair of training images derived from a monocular video and depth data associated with at least one of the training images; training the depth model using the training data to generate a self-supervised loss from the pair of training images and a supervised loss from the depth data, wherein training the depth model includes learning the camera type by generating, using a ray surface model that is a neural network, a ray surface that approximates an image character of the training images as produced by a camera having the camera type and to creating a synthesized image according to at least the ray surface and depth map by i) lifting pixels to produce three-dimensional points using the ray surface, the depth map, and a camera center point and ii) projecting the three-dimensional points

Assignees

Inventors

Classifications

  • Incorporation of unlabelled data, e.g. multiple instance learning [MIL] · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • using neural networks · CPC title

  • G06T7/521Primary

    from laser ranging, e.g. using interferometry; from the projection of structured light · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11436743B2 cover?
System, methods, and other embodiments described herein relate to semi-supervised training of a depth model using a neural camera model that is independent of a camera type. In one embodiment, a method includes acquiring training data including at least a pair of training images and depth data associated with the training images. The method includes training the depth model using the training d…
Who is the assignee on this patent?
Toyota Res Inst Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).