Systems and methods for estimating scaled maps by sampling representations from a learning model

US12293548B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12293548-B2
Application numberUS-202318486619-A
CountryUS
Kind codeB2
Filing dateOct 13, 2023
Priority dateApr 21, 2023
Publication dateMay 6, 2025
Grant dateMay 6, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and other embodiments described herein relate to estimating scaled depth maps by sampling variational representations of an image using a learning model. In one embodiment, a method includes encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera. The method also includes computing a probability distribution of the conditioned latent representations by factoring scale priors. The method also includes sampling the probability distribution to generate variations for the data embeddings. The method also includes estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks.

First claim

Opening claim text (preview).

What is claimed is: 1. An estimation system comprising: a memory storing instructions that, when executed by a processor, cause the processor to: encode data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; compute a probability distribution of the conditioned latent representations by factoring scale priors; sample the probability distribution to generate variations for the data embeddings; and estimate scaled depth maps of a scene from the variations at different coordinates using the attention networks. 2. The estimation system of claim 1 , wherein the instructions to encode the data embeddings further include instructions to: maintain by the learning model latent vectors associated with the conditioned latent representations; project the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and output mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 3. The estimation system of claim 1 , wherein the instructions to estimate the scaled depth maps further include instructions to: decode the probability distribution to generate latent vectors; and predict depth values for the scaled depth maps using the latent vectors and the calibration information by a cross-attention network of the attention networks, and the calibration information includes geometric properties about the camera. 4. The estimation system of claim 1 further including instructions to: infer by the learning model the scale priors using known priors acquired during training, and the scale priors were unknown during the training and the known priors represent appearance characteristics about objects within the scene that lack depth information. 5. The estimation system of claim 4 further including instructions to: transfer the scale priors to a vehicle having a sensor that acquires an image dataset, wherein the sensor has geometric properties that differ from the calibration information. 6. The estimation system of claim 1 , wherein the scaled depth maps include pixel locations that are uncertain for the data embeddings. 7. The estimation system of claim 6 further including instructions to: remove the pixel locations having increased uncertainty from the scaled depth maps according to the scene being indoors. 8. The estimation system of claim 1 , wherein the camera is a monocular camera that generates a single dataset for the image and the single dataset was unknown to the learning model during training. 9. The estimation system of claim 1 , wherein the conditioned latent representations include local information that factors global latent representations about the scene. 10. A non-transitory computer-readable medium comprising: instructions that when executed by a processor cause the processor to: encode data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; compute a probability distribution of the conditioned latent representations by factoring scale priors; sample the probability distribution to generate variations for the data embeddings; and estimate scaled depth maps of a scene from the variations at different coordinates using the attention networks. 11. The non-transitory computer-readable medium of claim 10 , wherein the instructions to encode the data embeddings further include instructions to: maintain by the learning model latent vectors associated with the conditioned latent representations; project the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and output mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 12. A method comprising: encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; computing a probability distribution of the conditioned latent representations by factoring scale priors; sampling the probability distribution to generate variations for the data embeddings; and estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks. 13. The method of claim 12 , wherein encoding the data embeddings further includes: maintaining by the learning model latent vectors associated with the conditioned latent representations; projecting the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and outputting mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 14. The method of claim 12 , wherein estimating the scaled depth maps further includes: decoding the probability distribution to generate latent vectors; and predicting depth values for the scaled depth maps using the latent vectors and the calibration information by a cross-attention network of the attention networks, and the calibration information includes geometric properties about the camera. 15. The method of claim 12 further comprising: inferring by the learning model the scale priors using known priors acquired during training, and the scale priors were unknown during the training and the known priors represent appearance characteristics about objects within the scene that lack depth information. 16. The method of claim 15 further comprising: transferring the scale priors to a vehicle having a sensor that acquires an image dataset, wherein the sensor has geometric properties that differ from the calibration information. 17. The method of claim 12 , wherein the scaled depth maps include pixel locations that are uncertain for the data embeddings. 18. The method of claim 17 further comprising: removing the pixel locations having increased uncertainty from the scaled depth maps according to the scene being indoors. 19. The method of claim 12 , wherein the camera is a monocular camera that generates a single dataset for the image and the single dataset was unknown to the learning model during training. 20. The method of claim 12 , wherein the conditioned latent representations include local information that factors global latent representations about the scene.

Assignees

Inventors

Classifications

  • G06T7/50Primary

    Depth or shape recovery · CPC title

  • Range image; Depth image; 3D point clouds · CPC title

  • Training; Learning · CPC title

  • G06T7/80Primary

    Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12293548B2 cover?
Systems, methods, and other embodiments described herein relate to estimating scaled depth maps by sampling variational representations of an image using a learning model. In one embodiment, a method includes encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and ca…
Who is the assignee on this patent?
Toyota Res Inst Inc, Toyota Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T7/50. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 06 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).