Who is the assignee on this patent?

Toyota Res Inst Inc, Toyota Motor Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06T7/50. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 06 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for estimating scaled maps by sampling representations from a learning model

US12293548B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12293548-B2
Application number	US-202318486619-A
Country	US
Kind code	B2
Filing date	Oct 13, 2023
Priority date	Apr 21, 2023
Publication date	May 6, 2025
Grant date	May 6, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and other embodiments described herein relate to estimating scaled depth maps by sampling variational representations of an image using a learning model. In one embodiment, a method includes encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera. The method also includes computing a probability distribution of the conditioned latent representations by factoring scale priors. The method also includes sampling the probability distribution to generate variations for the data embeddings. The method also includes estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks.

First claim

Opening claim text (preview).

What is claimed is: 1. An estimation system comprising: a memory storing instructions that, when executed by a processor, cause the processor to: encode data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; compute a probability distribution of the conditioned latent representations by factoring scale priors; sample the probability distribution to generate variations for the data embeddings; and estimate scaled depth maps of a scene from the variations at different coordinates using the attention networks. 2. The estimation system of claim 1 , wherein the instructions to encode the data embeddings further include instructions to: maintain by the learning model latent vectors associated with the conditioned latent representations; project the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and output mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 3. The estimation system of claim 1 , wherein the instructions to estimate the scaled depth maps further include instructions to: decode the probability distribution to generate latent vectors; and predict depth values for the scaled depth maps using the latent vectors and the calibration information by a cross-attention network of the attention networks, and the calibration information includes geometric properties about the camera. 4. The estimation system of claim 1 further including instructions to: infer by the learning model the scale priors using known priors acquired during training, and the scale priors were unknown during the training and the known priors represent appearance characteristics about objects within the scene that lack depth information. 5. The estimation system of claim 4 further including instructions to: transfer the scale priors to a vehicle having a sensor that acquires an image dataset, wherein the sensor has geometric properties that differ from the calibration information. 6. The estimation system of claim 1 , wherein the scaled depth maps include pixel locations that are uncertain for the data embeddings. 7. The estimation system of claim 6 further including instructions to: remove the pixel locations having increased uncertainty from the scaled depth maps according to the scene being indoors. 8. The estimation system of claim 1 , wherein the camera is a monocular camera that generates a single dataset for the image and the single dataset was unknown to the learning model during training. 9. The estimation system of claim 1 , wherein the conditioned latent representations include local information that factors global latent representations about the scene. 10. A non-transitory computer-readable medium comprising: instructions that when executed by a processor cause the processor to: encode data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; compute a probability distribution of the conditioned latent representations by factoring scale priors; sample the probability distribution to generate variations for the data embeddings; and estimate scaled depth maps of a scene from the variations at different coordinates using the attention networks. 11. The non-transitory computer-readable medium of claim 10 , wherein the instructions to encode the data embeddings further include instructions to: maintain by the learning model latent vectors associated with the conditioned latent representations; project the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and output mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 12. A method comprising: encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; computing a probability distribution of the conditioned latent representations by factoring scale priors; sampling the probability distribution to generate variations for the data embeddings; and estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks. 13. The method of claim 12 , wherein encoding the data embeddings further includes: maintaining by the learning model latent vectors associated with the conditioned latent representations; projecting the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and outputting mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 14. The method of claim 12 , wherein estimating the scaled depth maps further includes: decoding the probability distribution to generate latent vectors; and predicting depth values for the scaled depth maps using the latent vectors and the calibration information by a cross-attention network of the attention networks, and the calibration information includes geometric properties about the camera. 15. The method of claim 12 further comprising: inferring by the learning model the scale priors using known priors acquired during training, and the scale priors were unknown during the training and the known priors represent appearance characteristics about objects within the scene that lack depth information. 16. The method of claim 15 further comprising: transferring the scale priors to a vehicle having a sensor that acquires an image dataset, wherein the sensor has geometric properties that differ from the calibration information. 17. The method of claim 12 , wherein the scaled depth maps include pixel locations that are uncertain for the data embeddings. 18. The method of claim 17 further comprising: removing the pixel locations having increased uncertainty from the scaled depth maps according to the scene being indoors. 19. The method of claim 12 , wherein the camera is a monocular camera that generates a single dataset for the image and the single dataset was unknown to the learning model during training. 20. The method of claim 12 , wherein the conditioned latent representations include local information that factors global latent representations about the scene.

Assignees

Inventors

Classifications

G06T7/50Primary
Depth or shape recovery · CPC title
G06T2207/10028
Range image; Depth image; 3D point clouds · CPC title
G06T2207/20081
Training; Learning · CPC title
G06T7/80Primary
Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration · CPC title

Patent family

Related publications grouped by family.

View patent family 93121737

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12293548B2 cover?: Systems, methods, and other embodiments described herein relate to estimating scaled depth maps by sampling variational representations of an image using a learning model. In one embodiment, a method includes encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and ca…
Who is the assignee on this patent?: Toyota Res Inst Inc, Toyota Motor Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T7/50. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 06 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).