Deep novel view and lighting synthesis from sparse images
US-2021012561-A1 · Jan 14, 2021 · US
US12475638B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12475638-B2 |
| Application number | US-202018251743-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 5, 2020 |
| Priority date | Nov 5, 2020 |
| Publication date | Nov 18, 2025 |
| Grant date | Nov 18, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Example embodiments relate to techniques for volumetric performance capture with neural rendering. A technique may involve initially obtaining images that depict a subject from multiple viewpoints and under various lighting conditions using a light stage and depth data corresponding to the subject using infrared cameras. A neural network may extract features of the subject from the images based on the depth data and map the features into a texture space (e.g., the UV texture space). A neural renderer can be used to generate an output image depicting the subject from a target view such that illumination of the subject in the output image aligns with the target view. The neural render may resample the features of the subject from the texture space to an image space to generate the output image.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: obtaining, using a camera system and a light stage having a plurality of lights, a plurality of images that depict a subject from a plurality of viewpoints and under a plurality of lighting conditions; obtaining, using a plurality of infrared cameras, depth data corresponding to the subject; based on the depth data corresponding to the subject, extracting, using a neural network, a plurality of features of the subject from the plurality of images; pooling, using the neural network, the plurality of features of the subject into a texture space; reprojecting the pooled features into an image space; providing the pooled features reprojected into the image space with one or more graphical buffers as inputs to a neural renderer; and generating, using the neural renderer, an output image depicting the subject from a target view such that illumination of the subject in the output image aligns with the target view. 2 . The method of claim 1 , wherein obtaining the plurality of images that depict the subject comprises: capturing, using the camera system and the light stage, a plurality of image pairs depicting the subject under spherical gradient illumination conditions such that each image pair includes a gradient image and an inverse gradient image. 3 . The method of claim 2 , wherein obtaining the plurality of images that depict the subject further comprises: capturing, using the camera system and the light stage, a series of images that depict the subject under one-light-at-a-time conditions such that each image from the series of images depicts the subject under illumination from a single light from the plurality of lights. 4 . The method of claim 1 , further comprising: estimating a coarse geometry for the subject based on the depth data; and wherein extracting the plurality of features of the subject from the plurality of images comprises: extracting a feature from each image based on the coarse geometry estimated for the subject. 5 . The method of claim 4 , wherein extracting the feature from each image based on the coarse geometry estimated for the subject comprises: using a convolution neural network to extract the feature from each image. 6 . The method of claim 1 , further comprising: transforming, using a convolution neural network, the pooled features to extract implicit reflectance and local geometry information. 7 . The method of claim 1 , wherein the one or more graphical buffers includes at least one of a light map and a reflection map determined based on the implicit reflectance and local geometry information. 8 . The method of claim 7 , wherein generating, using the neural renderer, the output image depicting the subject from the target view such that illumination of the subject in the output image aligns with the target view comprises: causing the neural renderer to use the pooled features reprojected into the image space with the one or more graphical buffers to generate the output image depicting the subject from the target view. 9 . The method of claim 8 , wherein generating, using the neural renderer, the output image depicting the subject from the target view such that illumination of the subject in the output image aligns with the target view comprises: generating the output image depicting the subject in an arbitrary environment. 10 . The method of claim 1 , wherein generating, using the neural renderer, the output image depicting the subject from the target view such that illumination of the subject in the output image aligns with the target view further comprises: generating a series of images depicting the subject from a plurality of views such that illumination of the subject in each image aligns with a particular view associated with the image. 11 . The method of claim 1 , further comprising: determining a plurality warp fields configured to map pixels from an image to the texture space, wherein each warp field is determined using the depth data corresponding to the subject. 12 . The method of claim 1 , wherein the pooled features encode both local and global geometric properties and four dimensional (4D) reflectance. 13 . A system comprising: a camera system having a plurality of infrared cameras; a light stage having a plurality of lights; and a computing device configured to: obtain, using the camera system and the light stage having the plurality of lights, a plurality of images that depict a subject from a plurality of viewpoints and under a plurality of lighting conditions; obtain, using the plurality of infrared cameras, depth data corresponding to the subject; based on the depth data corresponding to the subject, extract, using a neural network, a plurality of features of the subject from the plurality of images; pool, using the neural network, the plurality of features of the subject into a texture space; reproject the pooled features into an image space; provide the pooled features reprojected into the image space with one or more graphical buffers as inputs to a neural renderer; and generate, using the neural renderer, an output image depicting the subject from a target view such that illumination of the subject in the output image aligns with the target view. 14 . The system of claim 13 , wherein the computing device is further configured to: transform, using a convolution neural network, the pooled features to extract implicit reflectance and local geometry information. 15 . The system of claim 13 , wherein the one or more graphical buffers includes at least one of a light map and a reflection map determined based on the depth data corresponding to the subject. 16 . The system of claim 15 , wherein the computing device is further configured to: cause the neural renderer to use the pooled features reprojected into the image space with the one or more graphical buffers to generate the output image depicting the subject from the target view. 17 . The system of claim 13 , wherein the neural network is a convolution neural network. 18 . The system of claim 13 , wherein the computing device is further configured to: display the output image on a display interface. 19 . The system of claim 18 , wherein the computing device is further configured to: receive an input specifying a second target view; and responsive to the input, generate a second output image depicting the subject from the second target view such that illumination of the subject in the second output image aligns with the second target view. 20 . A non-transitory computer-readable medium configured to store instructions, that when executed by a computing system comprising one or more processors, causes the computing system to perform operations comprising: obtaining, using a camera system and a light stage having a plurality of lights, a plurality of images that depict a subject from a plurality of viewpoints and under a plurality of lighting conditions; obtaining, using a plurality of infrared cameras, depth data corresponding to the subject; based on the depth data corresponding to the subject, extracting, using a neural network, a plurality of features of the subject from the plurality of images; pooling, using the neural network, the plurality of features of the subject into a texture space; reprojecting the pooled features into an image space; providing the pooled features reprojected into the image space with one or more graphical buffers as inputs to a neural rende
Human being; Person · CPC title
Artificial neural networks [ANN] · CPC title
Varying illumination · CPC title
Infrared image · CPC title
Perspective computation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.