Geometric 3D augmentations for transformer architectures

US12488483B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12488483-B2
Application numberUS-202318110421-A
CountryUS
Kind codeB2
Filing dateFeb 16, 2023
Priority dateJul 25, 2022
Publication dateDec 2, 2025
Grant dateDec 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture is provided. The method includes receiving, with a computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses, generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras, projecting information from the pointcloud onto the viewpoint of the virtual camera, and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture, the method comprising: receiving, with a computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses; selecting one of the plurality of cameras; translating a pose of the selected camera; adjusting a viewing angle of the translated selected camera toward a center of the pointcloud; generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras; projecting information from the pointcloud onto the viewpoint of the virtual camera; and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data. 2 . The method of claim 1 , wherein translating the pose of the selected camera comprises adding translation noise to the pose of the selected camera. 3 . The method of claim 1 , wherein the virtual camera is generated by: selecting one of the plurality of cameras as a canonical camera, applying a rotation matrix to the canonical camera, and propagating a rotation and a translation offset of the canonical camera resulting from the rotation matrix to other ones of the plurality of cameras. 4 . The method of claim 1 , further comprising: implementing the geometric scene representation architecture with the computing device; inputting the images of the scene captured by the plurality of cameras into the geometric scene representation architecture, wherein each camera of the plurality of cameras includes known embeddings; and encoding the images of the scene captured by the plurality of cameras, with the geometric scene representation architecture, into the latent scene representation. 5 . The method of claim 1 , further comprising training the geometric scene representation architecture by inputting the RGB image and the depth map corresponding to the viewpoint of the virtual camera. 6 . The method of claim 1 , further comprising: querying the latent scene representation with a camera embedding; and decoding the latent scene representation based on the camera embedding, thereby generating an estimated depth map and an estimated RGB image based on the camera embedding. 7 . A system for generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture, the system comprising: one or more processors; and a non-transitory, computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to: receive a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses; select one of the plurality of cameras; translate a pose of the selected camera; adjust a viewing angle of the translated selected camera toward a center of the pointcloud; generate a virtual camera having a viewpoint different from viewpoints of the plurality of cameras; project information from the pointcloud onto the viewpoint of the virtual camera; and decode the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data. 8 . The system of claim 7 , wherein translating the pose of the selected camera comprises adding translation noise to the pose of the selected camera. 9 . The system of claim 7 , wherein the virtual camera is generated by: selecting one of the plurality of cameras as a canonical camera, applying a rotation matrix to the canonical camera, and propagating a rotation and a translation offset of the canonical camera resulting from the rotation matrix to other ones of the plurality of cameras. 10 . The system of claim 7 , wherein the instructions further cause the one or more processors to: implement the geometric scene representation architecture with the system; input the images of the scene captured by the plurality of cameras into the geometric scene representation architecture, wherein each camera of the plurality of cameras includes known embeddings; and encode the images of the scene captured by the plurality of cameras, with the geometric scene representation architecture, into the latent scene representation. 11 . The system of claim 7 , wherein the instructions further cause the one or more processors to train the geometric scene representation architecture by inputting the RGB image and the depth map corresponding to the viewpoint of the virtual camera. 12 . The system of claim 7 , wherein the instructions further cause the one or more processors to: query the latent scene representation with a camera embedding; and decode the latent scene representation based on the camera embedding, thereby generating an estimated depth map and an estimated RGB image based on the camera embedding. 13 . A computing program product for generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture, the computing program product comprising machine-readable instructions stored on a non-transitory computer readable memory, which when executed by a computing device, causes the computing device to carry out steps comprising: receiving, with the computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses; selecting one of the plurality of cameras; translating a pose of the selected camera; adjusting a viewing angle of the translated selected camera toward a center of the pointcloud; generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras; projecting information from the pointcloud onto the viewpoint of the virtual camera; and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data. 14 . The computing program product of claim 13 , wherein translating the pose of the selected camera comprises adding translation noise to the pose of the selected camera. 15 . The computing program product of claim 13 , wherein the virtual camera is generated by: selecting one of the plurality of cameras as a canonical camera, applying a rotation matrix to the canonical camera, and propagating a rotation and a translation offset of the canonical camera resulting from the rotation matrix to other ones of the plurality of cameras. 16 . The computing program product of claim 13 , the steps caused to be carried out by the computing device further comprising: implementing the geometric scene representation architecture with the computing device; inputting the images of the scene captured by the plurality of cameras into the geometric scene representation architecture, wherein each camera of the plurality of cameras includes known embeddings; and encoding the images of the scene captured by the plurality of cameras, with the geometric scene representation architecture, into the latent sce

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12488483B2 cover?
A method of generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture is provided. The method includes receiving, with a computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses, …
Who is the assignee on this patent?
Toyota Res Inst Inc, Toyota Motor Co Ltd, Toyota Tech Institute At Chicago
What technology area does this patent fall under?
Primary CPC classification G06T7/593. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).