Image synthesis using diffusion models created from single or multiple view images

US12567197B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12567197-B2
Application numberUS-202318485225-A
CountryUS
Kind codeB2
Filing dateOct 11, 2023
Priority dateOct 11, 2022
Publication dateMar 3, 2026
Grant dateMar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for performing novel image synthesis using generative networks are provided. The encoder-based model is trained to infer a 3D representation of an input image. A feature image is then generated using volume rendering techniques in accordance with the 3D representation. The feature image is then concatenated with a noisy image and processed by a denoiser network to predict an output image from a novel viewpoint that is consistent with the input image. The denoiser network can be a modified Noise Conditional Score Network (NCSN). In some embodiments, multiple input images or keyframes can be provided as input, and a different 3D representation is generated for each input image. The feature image is then generated, during volume rendering, by sampling each of the 3D representations and applying a mean-pooling operation to generate an aggregate feature image.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a memory storing data for an encoder-based model, a renderer, and a denoiser; and one or more processors in communication with the memory, the one or more processors executing instructions to: receive one or more input images: generate, using the encoder-based model, one or more three-dimensional (3D) representations of the one or more input images, each 3D representation in the one or more representations corresponding to a particular input image of the one or more input images; generate a feature image, using the renderer, based on the one or more 3D representations; and generate an output image, using the denoiser, based at least on the feature image and a noisy image. 2 . The system of claim 1 , wherein the feature image comprises a plane-sweep volume (PSV) representation. 3 . The system of claim 2 , wherein generating the output image comprises generating the output image based on the feature image, the noisy image, and a relative pose vector. 4 . The system of claim 1 , wherein the encoder-based model comprises a deep convolution neural network (DCNN) configured to generate a set of low-resolution feature maps and a set of high-resolution feature maps using at least one atrous convolution layer. 5 . The system of claim 1 , wherein each of the one or more 3D representations comprises a five-dimensional (5D) frustum of shape features. 6 . The system of claim 1 , wherein the renderer is a volume renderer configured to trace rays through the 3D representations to generate the feature image. 7 . The system of claim 1 , wherein the one or more input images comprises a plurality of input images, and wherein generating the feature image comprises sampling, by the renderer, a sample from each of the 3D representations and applying a mean-pooling operator to the plurality of samples. 8 . The system of claim 1 , wherein the denoiser comprises a Noise Conditional Score Network (NCSN). 9 . The system of claim 1 , wherein the noisy image is generated by combining a plurality of noisy images corresponding to a plurality of frames of the video sequence. 10 . A non-transitory computer-readable medium storing instructions that, upon execution by one or more processors, cause a computing device to: receive one or more input images: generate, using an encoder-based model, one or more three-dimensional (3D) representations of the one or more input images, each 3D representation in the one or more representations corresponding to a particular input image of the one or more input images: generate, using a renderer, a feature image based on the one or more 3D representations; and generate an output image, using a denoiser, based at least on the feature image and a noisy image. 11 . A method, comprising: receiving one or more input images: generating, using an encoder-based model, one or more three-dimensional (3D) representations of the one or more input images, each 3D representation in the one or more representations corresponding to a particular input image of the one or more input images: generating a feature image, using a renderer, based on the one or more 3D representations; and generating an output image, using a denoiser, based at least on the feature image and a noisy image. 12 . The method of claim 11 , wherein the feature image comprises a plane-sweep volume (PSV) representation. 13 . The method of claim 12 , wherein generating the output image comprises generating the output image based on the feature image, the noisy image, and a relative pose vector. 14 . The method of claim 11 , wherein the encoder-based model comprises a deep convolution neural network (DCNN) configured to generate a set of low-resolution feature maps and a set of high-resolution feature maps using at least one atrous convolution layer. 15 . The method of claim 11 , wherein each of the one or more 3D representations comprises a five-dimensional (5D) frustum of shape features. 16 . The method of claim 11 , wherein the renderer comprises a volume renderer configured to trace rays through the one or more 3D representations to generate the feature image. 17 . The method of claim 11 , wherein the one or more input images comprises a plurality of input images, and wherein generating the feature image comprises sampling, by the renderer, a sample from each of the 3D representations and applying a mean-pooling operator to the plurality of samples. 18 . The method of claim 11 , wherein the denoiser comprises a Noise Conditional Score Network (NCSN). 19 . The method of claim 11 , wherein each 3D representation comprises a Neural Radiance Field (NeRF). 20 . The method of claim 11 , wherein the noisy image is generated by combining a plurality of noisy images corresponding to a plurality of frames of the video sequence.

Assignees

Inventors

Classifications

  • Feature selection, e.g. selecting representative features from a multi-dimensional feature space · CPC title

  • Image fusion; Image merging · CPC title

  • Artificial neural networks [ANN] · CPC title

  • G06T5/70Primary

    Denoising; Smoothing · CPC title

  • Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12567197B2 cover?
A method and system for performing novel image synthesis using generative networks are provided. The encoder-based model is trained to infer a 3D representation of an input image. A feature image is then generated using volume rendering techniques in accordance with the 3D representation. The feature image is then concatenated with a noisy image and processed by a denoiser network to predict an…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T5/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).