Scale-aware depth estimation using multi-camera projection loss

US12524894B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12524894-B2
Application numberUS-202418734906-A
CountryUS
Kind codeB2
Filing dateJun 5, 2024
Priority dateMar 16, 2021
Publication dateJan 13, 2026
Grant dateJan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for scale-aware depth estimation using multi-camera projection loss is described. The method includes determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes training a scale-aware depth estimation model and an ego-motion estimation model according to the multi-camera photometric loss. The method further includes predicting a 360° point cloud of a scene surrounding the ego vehicle according to the scale-aware depth estimation model and the ego-motion estimation model. The method also includes planning a vehicle control action of the ego vehicle according to the 360° point cloud of the scene surrounding the ego vehicle.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for scale-aware depth estimation, comprising: leveraging cross-camera temporal contexts via spatio-temporal photometric constraints to increase an amount of overlap between images captured by cameras of a 360° multi-camera rig using an ego-motion estimation of an ego vehicle; training a scale-aware depth estimation model and an ego-motion estimation model according to the leveraged cross-camera temporal contexts via spatio-temporal photometric constraints; enforcing, using a pose network, pose consistency constraints as a spatial photometric constraint to ensure cameras of the 360° multi-camera rig follow a same rigid body motion during the training of the scale-aware depth estimation model and the ego motion model; generating increased overlap images from the images captured by each camera of the 360° multi-camera rig of the ego vehicle using a trained scale-aware depth estimation model and a trained ego-motion estimation model; and generating a full surround mono-depth (FSM) 360° point cloud from the increased overlap images to illustrate a scene surrounding the ego vehicle. 2 . The method of claim 1 , further comprising: capturing images of the scene surrounding the ego vehicle using the 360° multi-camera rig of the ego vehicle, in which cameras of the 360° multi-camera rig have a predetermined minimum overlap; selecting target images and context images captured by the cameras of the 360° multi-camera rig of the ego vehicle at a same time-step and at different time-steps; and performing spatial-temporal transformations of the selected target images and the context images to determine a multi-camera photometric loss to train the scale-aware depth estimation model and the ego-motion estimation model. 3 . The method of claim 2 , in which performing the spatial-temporal transformations comprises warping the selected target images and the context images captured by different cameras of the 360° multi-camera rig at different time-steps according to the ego-motion estimation of the ego vehicle and known extrinsics of the different cameras. 4 . The method of claim 2 , in which performing the spatial-temporal transformations comprises: warping target images and source images captured by a same camera of the 360° multi-camera rig at different time-steps; and warping the context images and target images between different cameras and captured at the different time-steps according to the ego-motion estimation and known extrinsics of the different cameras. 5 . The method of claim 2 , further comprising reducing the multi-camera photometric loss during the training of both the scale-aware depth estimation model and the ego-motion estimation model. 6 . The method of claim 1 , further comprising planning a vehicle control action of the ego vehicle according to the FSM 360° point cloud of the scene surrounding the ego vehicle. 7 . The method of claim 6 , in which planning the vehicle control action comprises planning a trajectory of the ego vehicle according to the 360° point cloud of the scene surrounding the ego vehicle. 8 . A non-transitory computer-readable medium having program code recorded thereon for scale-aware depth estimation, the program code being executed by a processor and comprising: program code to leverage cross-camera temporal contexts via spatio-temporal photometric constraints to increase an amount of overlap between images captured by cameras of a 360° multi-camera rig using an ego-motion estimation of an ego vehicle; program code to train a scale-aware depth estimation model and an ego-motion estimation model according to the leveraged cross-camera temporal contexts via spatio-temporal photometric constraints; program code to enforce, using a pose network, pose consistency constraints as a spatial photometric constraint to ensure cameras of the 360° multi-camera rig follow a same rigid body motion during the training of the scale-aware depth estimation model and the ego motion model; program code to generate increased overlap images from the images captured by each camera of the 360° multi-camera rig of the ego vehicle using a trained scale-aware depth estimation model and a trained ego-motion estimation model; and program code to generate a full surround mono-depth (FSM) 360° point cloud from the increased overlap images to illustrate a scene surrounding the ego vehicle. 9 . The non-transitory computer-readable medium of claim 8 , further comprising: program code to capture images of the scene surrounding the ego vehicle using the 360° multi-camera rig of the ego vehicle, in which cameras of the 360° multi-camera rig have a predetermined minimum overlap; program code to select target images and context images captured by the cameras of the 360° multi-camera rig of the ego vehicle at a same time-step and at different time-steps; and program code to perform spatial-temporal transformations of the selected target images and the context images to determine a multi-camera photometric loss to train the scale-aware depth estimation model and the ego-motion estimation model. 10 . The non-transitory computer-readable medium of claim 9 , in which the program code to perform the spatial-temporal transformations comprises program code to warp the selected target images and the context images captured by different cameras of the 360° multi-camera rig at different time-steps according to the ego-motion estimation of the ego vehicle and known extrinsics of the different cameras. 11 . The non-transitory computer-readable medium of claim 9 , in which the program code to perform the spatial-temporal transformations comprises: program code to warp target images and source images captured by a same camera of the 360° multi-camera rig at different time-steps; and program code to warp the context images and target images between different cameras and captured at the different time-steps according to the ego-motion estimation and known extrinsics of the different cameras. 12 . The non-transitory computer-readable medium of claim 9 , further comprising program code to reduce the multi-camera photometric loss during the training of both the scale-aware depth estimation model and the ego-motion estimation model. 13 . The non-transitory computer-readable medium of claim 8 , further comprising program code to plan a vehicle control action of the ego vehicle according to the FSM 360° point cloud of the scene surrounding the ego vehicle. 14 . The non-transitory computer-readable medium of claim 13 , in which the program code to plan the vehicle control action comprises program code to plan a trajectory of the ego vehicle according to the 360° point cloud of the scene surrounding the ego vehicle.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • from positioning sensors located off-board the vehicle, e.g. from cameras · CPC title

  • providing all-round vision, e.g. using omnidirectional cameras · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12524894B2 cover?
A method for scale-aware depth estimation using multi-camera projection loss is described. The method includes determining a multi-camera photometric loss associated with a multi-camera rig of an ego vehicle. The method also includes training a scale-aware depth estimation model and an ego-motion estimation model according to the multi-camera photometric loss. The method further includes predic…
Who is the assignee on this patent?
Toyota Res Inst Inc, Toyota Tech Institute At Chicago
What technology area does this patent fall under?
Primary CPC classification G06T7/579. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).