Cross-attention decoding for volumetric rendering

US12524952B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12524952-B2
Application numberUS-202318364783-A
CountryUS
Kind codeB2
Filing dateAug 3, 2023
Priority dateNov 8, 2022
Publication dateJan 13, 2026
Grant dateJan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene. The method also includes generating a volumetric embedding that is representative of a novel viewing frame of the scene. The method includes decoding, with the decoder, the latent space using cross-attention with the volumetric embedding, and generating a novel viewing frame of the scene based on an output of the decoder.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: generating, through training, a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene; generating a volumetric embedding that is representative of a novel viewing frame of the scene by sampling values along a viewing ray to generate 3D points and Fourier encoding the sampled values; decoding, with the decoder, the latent space using cross-attention with the volumetric embedding; and generating the novel viewing frame of the scene based on an output of the decoder. 2 . The method of claim 1 , wherein the volumetric embedding is a concatenation of an origin embedding and a depth embedding. 3 . The method of claim 1 , wherein the novel viewing frame includes a predicted depth map of the scene from a perspective of the novel viewing frame. 4 . The method of claim 3 , wherein the predicted depth map is used to control at least one function of a vehicle. 5 . The method of claim 1 , wherein the novel viewing frame includes a bitmap of a novel image from a perspective of the novel viewing frame. 6 . The method of claim 1 , wherein generating the latent space further includes using a multi-view photometric loss to evaluate the latent space. 7 . The method of claim 6 , wherein the multi-view photometric loss includes a photometric objective that estimates contribution of synthesized novel views by performing a warping function on the one or more of the multiple images in the image data. 8 . A system comprising: A preprocessing platform, comprising at least one processor and memory, configured to generate, through training, a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene; a computer vision platform configured to: generate a volumetric embedding that is representative of a novel viewing frame of the scene by sampling values along a viewing ray to generate 3D points, and Fourier encoding the sampled values; decode, with the decoder, the latent space using cross-attention with the volumetric embedding; and generate the novel viewing frame of the scene based on an output of the decoder. 9 . The system of claim 8 , wherein the volumetric embedding is a concatenation of an origin embedding and a depth embedding. 10 . The system of claim 8 , wherein the novel viewing frame includes a predicted depth map of the scene from a perspective of the novel viewing frame. 11 . The system of claim 10 , wherein the predicted depth map is used to control at least one function of a vehicle. 12 . The system of claim 8 , wherein the novel viewing frame includes a bitmap of a novel image from a perspective of the novel viewing frame. 13 . The system of claim 8 , wherein to generate the latent space, the preprocessing platform is further configured to use a multi-view photometric loss to evaluate the latent space. 14 . The system of claim 13 , wherein the multi-view photometric loss includes a photometric objective that estimates contribution of synthesized novel views by performing a warping function on the one or more of the multiple images in the image data. 15 . A tangible computer readable medium comprising instructions that, when executed cause a system to: generate, through training, a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene; generate a volumetric embedding that is representative of a novel viewing frame of the scene by sampling values along a viewing ray to generate 3D points, and Fourier encoding the sampled values; decode, with the decoder, the latent space using cross-attention with the volumetric embedding; and generate the novel viewing frame of the scene based on an output of the decoder. 16 . The computer readable medium of claim 15 , wherein the volumetric embedding is a concatenation of an origin embedding and a depth embedding. 17 . The computer readable medium of claim 15 , wherein the novel viewing frame includes a predicted depth map of the scene from a perspective of the novel viewing frame. 18 . The computer readable medium of claim 17 , wherein the predicted depth map is used to control at least one function of a vehicle. 19 . The computer readable medium of claim 15 , wherein the novel viewing frame includes a bitmap of a novel image from a perspective of the novel viewing frame. 20 . The computer readable medium of claim 15 , wherein to generate the latent space, the instructions further cause the system to use a multi-view photometric loss to evaluate the latent space, wherein the multi-view photometric loss includes a photometric objective that estimates contribution of synthesized novel views by performing a warping function on the one or more of the multiple images in the image data.

Assignees

Inventors

Classifications

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • Three-dimensional [3D] objects · CPC title

  • Organisation of the process, e.g. bagging or boosting · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12524952B2 cover?
Systems and methods described herein support enhanced computer vision capabilities which may be applicable to, for example, autonomous vehicle operation. An example method includes generating a latent space and a decoder based on image data that includes multiple images, where each image has a different viewing frame of a scene. The method also includes generating a volumetric embedding that is…
Who is the assignee on this patent?
Toyota Res Inst Inc, Massachusetts Inst Technology, Toyota Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T15/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).