Method, computer program and apparatus for controlling operation of two or more configurable systems of a motor vehicle
US-11830455-B2 · Nov 28, 2023 · US
US12293548B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12293548-B2 |
| Application number | US-202318486619-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 13, 2023 |
| Priority date | Apr 21, 2023 |
| Publication date | May 6, 2025 |
| Grant date | May 6, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and other embodiments described herein relate to estimating scaled depth maps by sampling variational representations of an image using a learning model. In one embodiment, a method includes encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera. The method also includes computing a probability distribution of the conditioned latent representations by factoring scale priors. The method also includes sampling the probability distribution to generate variations for the data embeddings. The method also includes estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks.
Opening claim text (preview).
What is claimed is: 1. An estimation system comprising: a memory storing instructions that, when executed by a processor, cause the processor to: encode data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; compute a probability distribution of the conditioned latent representations by factoring scale priors; sample the probability distribution to generate variations for the data embeddings; and estimate scaled depth maps of a scene from the variations at different coordinates using the attention networks. 2. The estimation system of claim 1 , wherein the instructions to encode the data embeddings further include instructions to: maintain by the learning model latent vectors associated with the conditioned latent representations; project the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and output mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 3. The estimation system of claim 1 , wherein the instructions to estimate the scaled depth maps further include instructions to: decode the probability distribution to generate latent vectors; and predict depth values for the scaled depth maps using the latent vectors and the calibration information by a cross-attention network of the attention networks, and the calibration information includes geometric properties about the camera. 4. The estimation system of claim 1 further including instructions to: infer by the learning model the scale priors using known priors acquired during training, and the scale priors were unknown during the training and the known priors represent appearance characteristics about objects within the scene that lack depth information. 5. The estimation system of claim 4 further including instructions to: transfer the scale priors to a vehicle having a sensor that acquires an image dataset, wherein the sensor has geometric properties that differ from the calibration information. 6. The estimation system of claim 1 , wherein the scaled depth maps include pixel locations that are uncertain for the data embeddings. 7. The estimation system of claim 6 further including instructions to: remove the pixel locations having increased uncertainty from the scaled depth maps according to the scene being indoors. 8. The estimation system of claim 1 , wherein the camera is a monocular camera that generates a single dataset for the image and the single dataset was unknown to the learning model during training. 9. The estimation system of claim 1 , wherein the conditioned latent representations include local information that factors global latent representations about the scene. 10. A non-transitory computer-readable medium comprising: instructions that when executed by a processor cause the processor to: encode data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; compute a probability distribution of the conditioned latent representations by factoring scale priors; sample the probability distribution to generate variations for the data embeddings; and estimate scaled depth maps of a scene from the variations at different coordinates using the attention networks. 11. The non-transitory computer-readable medium of claim 10 , wherein the instructions to encode the data embeddings further include instructions to: maintain by the learning model latent vectors associated with the conditioned latent representations; project the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and output mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 12. A method comprising: encoding data embeddings by a learning model to form conditioned latent representations using attention networks, the data embeddings including features about an image from a camera and calibration information about the camera; computing a probability distribution of the conditioned latent representations by factoring scale priors; sampling the probability distribution to generate variations for the data embeddings; and estimating scaled depth maps of a scene from the variations at different coordinates using the attention networks. 13. The method of claim 12 , wherein encoding the data embeddings further includes: maintaining by the learning model latent vectors associated with the conditioned latent representations; projecting the data embeddings onto the latent vectors by a cross-attention network of the attention networks using a normal distribution of the latent vectors; and outputting mean and standard deviation pairs of the conditioned latent representations using the data embeddings and the features by a self-attention network of the attention networks, and the self-attention network operates in a dimensional space that is reduced. 14. The method of claim 12 , wherein estimating the scaled depth maps further includes: decoding the probability distribution to generate latent vectors; and predicting depth values for the scaled depth maps using the latent vectors and the calibration information by a cross-attention network of the attention networks, and the calibration information includes geometric properties about the camera. 15. The method of claim 12 further comprising: inferring by the learning model the scale priors using known priors acquired during training, and the scale priors were unknown during the training and the known priors represent appearance characteristics about objects within the scene that lack depth information. 16. The method of claim 15 further comprising: transferring the scale priors to a vehicle having a sensor that acquires an image dataset, wherein the sensor has geometric properties that differ from the calibration information. 17. The method of claim 12 , wherein the scaled depth maps include pixel locations that are uncertain for the data embeddings. 18. The method of claim 17 further comprising: removing the pixel locations having increased uncertainty from the scaled depth maps according to the scene being indoors. 19. The method of claim 12 , wherein the camera is a monocular camera that generates a single dataset for the image and the single dataset was unknown to the learning model during training. 20. The method of claim 12 , wherein the conditioned latent representations include local information that factors global latent representations about the scene.
Related publications grouped by family.
Answers are generated from the same data shown on this page.