Neural rendering
US-11967015-B2 · Apr 23, 2024 · US
US12530835B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12530835-B2 |
| Application number | US-202318364853-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 3, 2023 |
| Priority date | Nov 8, 2022 |
| Publication date | Jan 20, 2026 |
| Grant date | Jan 20, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An example method includes generating embeddings of image data that includes multiple images, where each image has a different viewpoints of a scene, generating a latent space and a decoder, wherein the decoder receives embeddings as input to generate an output viewpoint, for each viewpoint in the image data, determining a volumetric rendering view synthesis loss and a multi-view photometric loss, and applying an optimization algorithm to the latent space and the decoder over a number of epochs until the volumetric rendering view synthesis loss is within a volumetric threshold and the multi-view photometric loss is within a multi-view threshold.
Opening claim text (preview).
What is claimed is: 1 . A method of training a latent space trainer for volumetric rendering, the method comprising: generating embeddings of image data that includes multiple images by sampling values along a viewing ray to generate 3D points and Fourier encoding the sampled points, where each image has a different viewpoint of a scene; generating a latent space and a decoder, wherein the decoder receives embeddings as input to generate an output viewpoint; for each viewpoint in the image data, determining a volumetric rendering view synthesis loss and a multi-view photometric loss; and applying an optimization algorithm to the latent space and the decoder over a number of epochs until the volumetric rendering view synthesis loss is within a volumetric threshold and the multi-view photometric loss is within a multi-view threshold. 2 . The method of claim 1 , wherein the optimization algorithm uses a Mean Square Error objective for the volumetric rendering view synthesis loss. 3 . The method of claim 1 , wherein the optimization algorithm is a gradient descent algorithm. 4 . The method of claim 1 , wherein the multi-view photometric loss is determined using a photometric objective. 5 . The method of claim 4 , wherein the photometric objective is determined by: for each pixel of a target image of the image data, with a predicted depth ({circumflex over (d)}), generating, by a warping operation, projected coordinates with a predicted depth ({circumflex over (d)}′) in a context image; generating a synthesized target image from the context image; and determining a difference between the target image and the synthesized target image. 6 . The method of claim 5 , wherein the context image is generated by a transformation matrix. 7 . The method of claim 5 , wherein the difference between the target image and the synthesized target image is determined by a weighted structural similarity index. 8 . The method of claim 5 , wherein the synthesized target image is generated by applying grid sampling with bilinear interpolation to place information from the context image onto each target pixel of the synthesized target image based on the projected coordinates. 9 . The method of claim 5 , wherein pixels of the target image for determining the photometric objective are determined using strided ray sampling. 10 . A system for training a latent space trainer for volumetric rendering, the system comprising: one or more processors; a non-transitory computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to: generate embeddings of image data that includes multiple images by sampling values along a viewing ray to generate 3D points and Fourier encoding the sampled points, where each image has a different viewpoint of a scene; generate a latent space and a decoder, wherein the decoder receives embeddings as input to generate an output viewpoint; for each viewpoint in the image data, determine a volumetric rendering view synthesis loss and a multi-view photometric loss; and apply an optimization algorithm to the latent space and the decoder over a number of epochs until the volumetric rendering view synthesis loss is within a volumetric threshold and the multi-view photometric loss is within a multi-view threshold. 11 . The system of claim 10 , wherein the optimization algorithm uses a Mean Square Error objective for the volumetric rendering view synthesis loss. 12 . The system of claim 10 , wherein the optimization algorithm is a gradient descent algorithm. 13 . The system of claim 10 , wherein the multi-view photometric loss is determined using a photometric objective. 14 . The system of claim 13 , wherein the photometric objective is determined by: for each pixel of a target image of the image data, with a predicted depth ({circumflex over (d)}), generating, by a warping operation, projected coordinates with a predicted depth ({circumflex over (d)}′) in a context image; generating a synthesized target image from the context image; and determining a difference between the target image and the synthesized target image. 15 . The system of claim 14 , wherein the context image is generated by a transformation matrix. 16 . The system of claim 14 , wherein the difference between the target image and the synthesized target image is determined by a weighted structural similarity index. 17 . The system of claim 14 , wherein the synthesized target image is generated by applying grid sampling with bilinear interpolation to place information from the context image onto each target pixel of the synthesized target image based on the projected coordinates. 18 . The system of claim 14 , wherein pixels of the target image for determining the photometric objective are determined using strided ray sampling. 19 . A tangible computer-readable medium comprising instructions that, when executed, cause a system to: generate embeddings of image data that includes multiple images by sampling values along a viewing ray to generate 3D points and Fourier encoding the sampled points, where each image has a different viewpoint of a scene; generate a latent space and a decoder, wherein the decoder receives embeddings as input to generate an output viewpoint; for each viewpoint in the image data, determine a volumetric rendering view synthesis loss and a multi-view photometric loss; and apply an optimization algorithm to the latent space and the decoder over a number of epochs until the volumetric rendering view synthesis loss is within a volumetric threshold and the multi-view photometric loss is within a multi-view threshold. 20 . The system of claim 19 , wherein the multi-view photometric loss is determined using a photometric objective.
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Three-dimensional [3D] objects · CPC title
Organisation of the process, e.g. bagging or boosting · CPC title
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.