What technology area does this patent fall under?

Primary CPC classification H04N13/111. Mapped technology areas include Electricity.

When was this patent published?

Publication date Thu Jan 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Enhancing performance capture with real-time neural rendering

US2022014723A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022014723-A1
Application number	US-201917309440-A
Country	US
Kind code	A1
Filing date	Dec 2, 2019
Priority date	Dec 3, 2018
Publication date	Jan 13, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Three-dimensional (3D) performance capture and machine learning can be used to re-render high quality novel viewpoints of a captured scene. A textured 3D reconstruction is first rendered to a novel viewpoint. Due to imperfections in geometry and low-resolution texture, the 2D rendered image contains artifacts and is low quality. Accordingly, a deep learning technique is disclosed that takes these images as input and generates more visually enhanced re-rendering. The system is specifically designed for VR and AR headsets, and accounts for consistency between two stereo views.

First claim

Opening claim text (preview).

1 . A method for re-rendering an image rendered using a volumetric reconstruction to improve its quality, comprising: receiving the image rendered using the volumetric reconstruction, the image having imperfections; defining a synthesizing function and a segmentation mask to generate an enhanced image from the image, the enhanced image having fewer imperfections than the image; and computing the synthesizing function and the segmentation mask using a neural network trained based on minimizing a loss function between a predicted image generated by the neural network and a ground truth image captured by a ground truth camera during training. 2 . The method according to claim 1 , wherein the method further includes prior to receiving the image rendered using the volumetric reconstruction: capturing a 3D model using a volumetric capture system; and rendering the image using the volumetric reconstruction. 3 . The method according to claim 2 , wherein the ground truth camera and the volumetric capture system are both directed to a view during training, the ground truth camera producing higher quality images than the volumetric capture system. 4 . The method according to claim 1 , wherein the loss function includes a reconstruction loss based on a reconstruction difference between a segmented ground truth image mapped to activations of layers in a neural network and a segmented predicted image mapped to activations of layers in a neural network, the segmented ground truth image segmented by a ground truth segmentation mask to remove background pixels and the segmented predicted image segmented by a predicted segmentation mask to remove back ground pixels. 5 . The method according to claim 1 , wherein the loss function includes a head reconstruction loss based on a reconstruction difference between a cropped ground truth image mapped to activations of layers in a neural network and a cropped predicted image mapped to activations of layers in a neural network, the cropped ground truth image cropped to a head of a person identified in a ground truth segmentation mask and the cropped predicted image cropped to the head of the person identified in a predicted segmentation mask. 6 . The method according to claim 4 , wherein the reconstruction difference is saliency re-weighted to down-weight reconstruction differences for pixels above a maximum error or below a minimum error. 7 . The method according to claim 1 , wherein the loss function includes a mask loss based on a mask difference between a ground truth segmentation mask and a predicted segmentation mask. 8 . The method according to claim 7 , wherein the mask difference is saliency re-weighted to down-weight reconstruction differences for pixels above a maximum error or below a minimum error. 9 . The method according to claim 1 , wherein: the predicted image is one of a series of consecutive frames of a predicted sequence and the ground truth image is one of a series of consecutive frames of a ground truth sequence; and wherein: the loss function includes a temporal loss based on a gradient difference between a temporal gradient of the predicted sequence and a temporal gradient of the ground truth sequence. 10 . The method according to claim 1 , wherein the predicted image is one of a predicted stereo pair of images and the loss function includes a stereo loss based on a stereo difference between the predicted stereo pair of images. 11 . The method according to claim 1 , wherein the neural network is based on a fully convolutional model. 12 . The method according to claim 1 , wherein the computing the synthesizing function and segmentation mask using a neural network comprises: computing the synthesizing function and segmentation mask for a left eye viewpoint; and computing the synthesizing function and segmentation mask for a right eye view point. 13 . The method according to claim 1 , wherein the computing the synthesizing function and segmentation mask using a neural network is performed in real time. 14 . A performance capture system comprising: a volumetric capture system configured to render at least one image reconstructed from at least one viewpoint of a captured 3D model, the at least one image including imperfections; a rendering system configured to receive the at least one image from the volumetric capture system and to generate, in real time, at least one enhanced image in which the imperfections of the at least one image are reduced, the rendering system including a neural network configured to generate the at least one enhanced image by training prior to use, the training including minimizing a loss function between predicted images generated by the neural network during training and corresponding ground truth images captured by at least one ground truth camera coordinated with the volumetric capture system during training. 15 . The performance capture system according to claim 14 , wherein the at least one ground truth camera is included in the performance capture system during training and otherwise not included in the performance capture system. 16 . The performance capture system according to claim 14 , wherein the volumetric capture system includes a single active stereo camera directed to a single view and, during training, includes a single ground truth camera directed to the single view. 17 . The performance capture system according to claim 14 , wherein the volumetric capture system includes a plurality of active stereo cameras directed to multiple views and, during training, includes a plurality of ground truth cameras directed to the multiple views. 18 . The performance capture system according to claim 14 , wherein the performance capture system includes a stereo display configured to display one of the at least one enhanced image as a left eye view and one of the at least one enhanced image as a right eye view. 19 . The performance capture system according to claim 18 , wherein the performance capture system is a virtual reality (VR) headset. 20 . The performance capture system according to claim 18 , wherein the stereo display is included in an augmented reality (AR) headset. 21 . The performance capture system according to claim 18 , wherein the stereo display is a head-tracked auto-stereo display. 22 . A non-transitory computer readable storage medium containing program code that when executed by a processor of a computing device causes the computing device to perform a method for re-rendering an image rendered using a volumetric reconstruction to improve its quality, the method including: receiving the image rendered using the volumetric reconstruction, the image having imperfections; defining a synthesizing function and a segmentation mask to generate an enhanced image from the image, the enhanced image having fewer imperfections than the image; and computing the synthesizing function and the segmentation mask using a neural network trained based on minimizing a loss function between a predicted image generated by the neural network and a ground truth image captured by a ground truth camera during training. 23 . The non-transitory computer readable storage medium containing program code that when executed by a processor of a computing device causes the computing device to perform a method for re-rendering an image rendered using a volumetric reconstruction to improve its quality according to claim 22 , wherein the loss function includes a reconstructi

Assignees

Google Llc

Inventors

Classifications

H04N13/332
Displays for viewing with the aid of special glasses or head-mounted displays [HMD] · CPC title
G06T2207/10028
Range image; Depth image; 3D point clouds · CPC title
G06T2207/30196
Human being; Person · CPC title
H04N13/243
using three or more two-dimensional [2D] image sensors · CPC title
G06V10/462
Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title

Patent family

Related publications grouped by family.

View patent family 68966095

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022014723A1 cover?: Three-dimensional (3D) performance capture and machine learning can be used to re-render high quality novel viewpoints of a captured scene. A textured 3D reconstruction is first rendered to a novel viewpoint. Due to imperfections in geometry and low-resolution texture, the 2D rendered image contains artifacts and is low quality. Accordingly, a deep learning technique is disclosed that takes the…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification H04N13/111. Mapped technology areas include Electricity.
When was this patent published?: Publication date Thu Jan 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and system for using machine-learning for object instance segmentation

Compression of multi-dimensional object representations

Fusing, texturing, and rendering views of dynamic three-dimensional models

View interpolation of multi-camera array images with flow estimation and image super resolution using deep learning

Frequently asked questions