Neural rerendering from 3D models
US-11288857-B2 · Mar 29, 2022 · US
US12488483B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12488483-B2 |
| Application number | US-202318110421-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 16, 2023 |
| Priority date | Jul 25, 2022 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture is provided. The method includes receiving, with a computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses, generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras, projecting information from the pointcloud onto the viewpoint of the virtual camera, and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data.
Opening claim text (preview).
What is claimed is: 1 . A method of generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture, the method comprising: receiving, with a computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses; selecting one of the plurality of cameras; translating a pose of the selected camera; adjusting a viewing angle of the translated selected camera toward a center of the pointcloud; generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras; projecting information from the pointcloud onto the viewpoint of the virtual camera; and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data. 2 . The method of claim 1 , wherein translating the pose of the selected camera comprises adding translation noise to the pose of the selected camera. 3 . The method of claim 1 , wherein the virtual camera is generated by: selecting one of the plurality of cameras as a canonical camera, applying a rotation matrix to the canonical camera, and propagating a rotation and a translation offset of the canonical camera resulting from the rotation matrix to other ones of the plurality of cameras. 4 . The method of claim 1 , further comprising: implementing the geometric scene representation architecture with the computing device; inputting the images of the scene captured by the plurality of cameras into the geometric scene representation architecture, wherein each camera of the plurality of cameras includes known embeddings; and encoding the images of the scene captured by the plurality of cameras, with the geometric scene representation architecture, into the latent scene representation. 5 . The method of claim 1 , further comprising training the geometric scene representation architecture by inputting the RGB image and the depth map corresponding to the viewpoint of the virtual camera. 6 . The method of claim 1 , further comprising: querying the latent scene representation with a camera embedding; and decoding the latent scene representation based on the camera embedding, thereby generating an estimated depth map and an estimated RGB image based on the camera embedding. 7 . A system for generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture, the system comprising: one or more processors; and a non-transitory, computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to: receive a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses; select one of the plurality of cameras; translate a pose of the selected camera; adjust a viewing angle of the translated selected camera toward a center of the pointcloud; generate a virtual camera having a viewpoint different from viewpoints of the plurality of cameras; project information from the pointcloud onto the viewpoint of the virtual camera; and decode the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data. 8 . The system of claim 7 , wherein translating the pose of the selected camera comprises adding translation noise to the pose of the selected camera. 9 . The system of claim 7 , wherein the virtual camera is generated by: selecting one of the plurality of cameras as a canonical camera, applying a rotation matrix to the canonical camera, and propagating a rotation and a translation offset of the canonical camera resulting from the rotation matrix to other ones of the plurality of cameras. 10 . The system of claim 7 , wherein the instructions further cause the one or more processors to: implement the geometric scene representation architecture with the system; input the images of the scene captured by the plurality of cameras into the geometric scene representation architecture, wherein each camera of the plurality of cameras includes known embeddings; and encode the images of the scene captured by the plurality of cameras, with the geometric scene representation architecture, into the latent scene representation. 11 . The system of claim 7 , wherein the instructions further cause the one or more processors to train the geometric scene representation architecture by inputting the RGB image and the depth map corresponding to the viewpoint of the virtual camera. 12 . The system of claim 7 , wherein the instructions further cause the one or more processors to: query the latent scene representation with a camera embedding; and decode the latent scene representation based on the camera embedding, thereby generating an estimated depth map and an estimated RGB image based on the camera embedding. 13 . A computing program product for generating additional supervision data to improve learning of a geometrically-consistent latent scene representation with a geometric scene representation architecture, the computing program product comprising machine-readable instructions stored on a non-transitory computer readable memory, which when executed by a computing device, causes the computing device to carry out steps comprising: receiving, with the computing device, a latent scene representation encoding a pointcloud from images of a scene captured by a plurality of cameras each with known intrinsics and poses; selecting one of the plurality of cameras; translating a pose of the selected camera; adjusting a viewing angle of the translated selected camera toward a center of the pointcloud; generating a virtual camera having a viewpoint different from viewpoints of the plurality of cameras; projecting information from the pointcloud onto the viewpoint of the virtual camera; and decoding the latent scene representation based on the virtual camera thereby generating an RGB image and depth map corresponding to the viewpoint of the virtual camera for implementation as additional supervision data. 14 . The computing program product of claim 13 , wherein translating the pose of the selected camera comprises adding translation noise to the pose of the selected camera. 15 . The computing program product of claim 13 , wherein the virtual camera is generated by: selecting one of the plurality of cameras as a canonical camera, applying a rotation matrix to the canonical camera, and propagating a rotation and a translation offset of the canonical camera resulting from the rotation matrix to other ones of the plurality of cameras. 16 . The computing program product of claim 13 , the steps caused to be carried out by the computing device further comprising: implementing the geometric scene representation architecture with the computing device; inputting the images of the scene captured by the plurality of cameras into the geometric scene representation architecture, wherein each camera of the plurality of cameras includes known embeddings; and encoding the images of the scene captured by the plurality of cameras, with the geometric scene representation architecture, into the latent sce
Range image; Depth image; 3D point clouds · CPC title
Color image · CPC title
Training; Learning · CPC title
Stereo camera calibration · CPC title
Artificial neural networks [ANN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.