What technology area does this patent fall under?

Primary CPC classification G06T5/70. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Image synthesis using diffusion models created from single or multiple view images

US12567197B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12567197-B2
Application number	US-202318485225-A
Country	US
Kind code	B2
Filing date	Oct 11, 2023
Priority date	Oct 11, 2022
Publication date	Mar 3, 2026
Grant date	Mar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for performing novel image synthesis using generative networks are provided. The encoder-based model is trained to infer a 3D representation of an input image. A feature image is then generated using volume rendering techniques in accordance with the 3D representation. The feature image is then concatenated with a noisy image and processed by a denoiser network to predict an output image from a novel viewpoint that is consistent with the input image. The denoiser network can be a modified Noise Conditional Score Network (NCSN). In some embodiments, multiple input images or keyframes can be provided as input, and a different 3D representation is generated for each input image. The feature image is then generated, during volume rendering, by sampling each of the 3D representations and applying a mean-pooling operation to generate an aggregate feature image.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a memory storing data for an encoder-based model, a renderer, and a denoiser; and one or more processors in communication with the memory, the one or more processors executing instructions to: receive one or more input images: generate, using the encoder-based model, one or more three-dimensional (3D) representations of the one or more input images, each 3D representation in the one or more representations corresponding to a particular input image of the one or more input images; generate a feature image, using the renderer, based on the one or more 3D representations; and generate an output image, using the denoiser, based at least on the feature image and a noisy image. 2 . The system of claim 1 , wherein the feature image comprises a plane-sweep volume (PSV) representation. 3 . The system of claim 2 , wherein generating the output image comprises generating the output image based on the feature image, the noisy image, and a relative pose vector. 4 . The system of claim 1 , wherein the encoder-based model comprises a deep convolution neural network (DCNN) configured to generate a set of low-resolution feature maps and a set of high-resolution feature maps using at least one atrous convolution layer. 5 . The system of claim 1 , wherein each of the one or more 3D representations comprises a five-dimensional (5D) frustum of shape features. 6 . The system of claim 1 , wherein the renderer is a volume renderer configured to trace rays through the 3D representations to generate the feature image. 7 . The system of claim 1 , wherein the one or more input images comprises a plurality of input images, and wherein generating the feature image comprises sampling, by the renderer, a sample from each of the 3D representations and applying a mean-pooling operator to the plurality of samples. 8 . The system of claim 1 , wherein the denoiser comprises a Noise Conditional Score Network (NCSN). 9 . The system of claim 1 , wherein the noisy image is generated by combining a plurality of noisy images corresponding to a plurality of frames of the video sequence. 10 . A non-transitory computer-readable medium storing instructions that, upon execution by one or more processors, cause a computing device to: receive one or more input images: generate, using an encoder-based model, one or more three-dimensional (3D) representations of the one or more input images, each 3D representation in the one or more representations corresponding to a particular input image of the one or more input images: generate, using a renderer, a feature image based on the one or more 3D representations; and generate an output image, using a denoiser, based at least on the feature image and a noisy image. 11 . A method, comprising: receiving one or more input images: generating, using an encoder-based model, one or more three-dimensional (3D) representations of the one or more input images, each 3D representation in the one or more representations corresponding to a particular input image of the one or more input images: generating a feature image, using a renderer, based on the one or more 3D representations; and generating an output image, using a denoiser, based at least on the feature image and a noisy image. 12 . The method of claim 11 , wherein the feature image comprises a plane-sweep volume (PSV) representation. 13 . The method of claim 12 , wherein generating the output image comprises generating the output image based on the feature image, the noisy image, and a relative pose vector. 14 . The method of claim 11 , wherein the encoder-based model comprises a deep convolution neural network (DCNN) configured to generate a set of low-resolution feature maps and a set of high-resolution feature maps using at least one atrous convolution layer. 15 . The method of claim 11 , wherein each of the one or more 3D representations comprises a five-dimensional (5D) frustum of shape features. 16 . The method of claim 11 , wherein the renderer comprises a volume renderer configured to trace rays through the one or more 3D representations to generate the feature image. 17 . The method of claim 11 , wherein the one or more input images comprises a plurality of input images, and wherein generating the feature image comprises sampling, by the renderer, a sample from each of the 3D representations and applying a mean-pooling operator to the plurality of samples. 18 . The method of claim 11 , wherein the denoiser comprises a Noise Conditional Score Network (NCSN). 19 . The method of claim 11 , wherein each 3D representation comprises a Neural Radiance Field (NeRF). 20 . The method of claim 11 , wherein the noisy image is generated by combining a plurality of noisy images corresponding to a plurality of frames of the video sequence.

Assignees

Nvidia Corp

Inventors

Classifications

G06V10/771
Feature selection, e.g. selecting representative features from a multi-dimensional feature space · CPC title
G06T2207/20221
Image fusion; Image merging · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T5/70Primary
Denoising; Smoothing · CPC title
G06V10/44
Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title

Patent family

Related publications grouped by family.

View patent family 91282211

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12567197B2 cover?: A method and system for performing novel image synthesis using generative networks are provided. The encoder-based model is trained to infer a 3D representation of an input image. A feature image is then generated using volume rendering techniques in accordance with the 3D representation. The feature image is then concatenated with a noisy image and processed by a denoiser network to predict an…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06T5/70. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

3d-cnn processing for ct image noise removal

Neural network system with temporal feedback for denoising of rendered sequences

Systems and methods for image denoising using deep convolutional networks

Neural network system with temporal feedback for adaptive sampling and denoising of rendered sequences

System and method for processing data acquired utilizing multi-energy computed tomography imaging

Neural network system with temporal feedback for denoising of rendered sequences

Denoising monte carlo renderings using neural networks with asymmetric loss

Frequently asked questions