Generative adversarial neural network assisted video reconstruction

US11580395B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11580395-B2
Application numberUS-202017069449-A
CountryUS
Kind codeB2
Filing dateOct 13, 2020
Priority dateNov 14, 2018
Publication dateFeb 14, 2023
Grant dateFeb 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A latent code defined in an input space is processed by the mapping neural network to produce an intermediate latent code defined in an intermediate latent space. The intermediate latent code may be used as appearance vector that is processed by the synthesis neural network to generate an image. The appearance vector is a compressed encoding of data, such as video frames including a person's face, audio, and other data. Captured images may be converted into appearance vectors at a local device and transmitted to a remote device using much less bandwidth compared with transmitting the captured images. A synthesis neural network at the remote device reconstructs the images for display.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: obtaining data for replicating a style specific to a subject, wherein the data is determined by training a generator neural network to produce images of the subject that are compared with captured images of the subject; configuring a neural network to apply the data to a sequence of vectors generated for a human face captured in a first sequence of images to modify at least one attribute according to the style specific to the subject; receiving, through a communication network, at least a first vector in the sequence of vectors, wherein the first vector encodes attributes of the human face captured in a first image in the first sequence of images; and processing, by the neural network, the sequence of vectors to reconstruct a second sequence of images of the human face including the at least one attribute that is modified based on the style. 2. The computer-implemented method of claim 1 , wherein the first vector is a compressed encoding of the human face. 3. The computer-implemented method of claim 1 , wherein the first image in the first sequence of images is a frame of video and further comprising receiving vector adjustment values for each additional frame of the video. 4. The computer-implemented method of claim 3 , further comprising successively applying each vector adjustment value to the first vector to reconstruct additional images of the human face including the at least one attribute that is modified based on the style. 5. The computer-implemented method of claim 1 , wherein the attributes comprise head pose and facial expression. 6. The computer-implemented method of claim 1 , wherein the first vector encodes at least one additional attribute associated with clothing, hairstyle, or lighting. 7. The computer-implemented method of claim 1 , further comprising displaying the second sequence of images of the human face in a viewing environment, wherein the neural network reconstructs the second sequence of images according to lighting in the viewing environment instead of different lighting associated with the first sequence of images and that is encoded in the first vector. 8. The computer-implemented method of claim 1 , further comprising receiving encoded background image data that is combined with the second sequence of images of the human face. 9. The computer-implemented method of claim 1 , wherein the first vector comprises an abstract latent code. 10. The computer-implemented method of claim 9 , wherein the abstract latent code is computed by a remote mapping neural network and transmitted to the neural network through the communication network. 11. The computer-implemented method of claim 9 , wherein the abstract latent code is computed by transforming facial landmark points that delineate positions of key points on the human face according to a learned or optimized matrix. 12. The computer-implemented method of claim 1 , wherein the first vector is transmitted to the neural network during a videoconferencing session. 13. The computer-implemented method of claim 1 , wherein the subject is a real or synthetic character and the human face is a different human compared with the subject. 14. The computer-implemented method of claim 1 , wherein the subject is a real or synthetic character and the human face corresponds to the subject. 15. The computer-implemented method of claim 1 , further comprising interpolating a third vector and a second vector corresponding to two frames in a video to produce the first vector in the sequence of vectors, wherein the first image is between the two frames. 16. The computer-implemented method of claim 1 , further comprising receiving audio data, wherein the audio data is used to reconstruct the second sequence of images of the human face. 17. The computer-implemented method of claim 1 , wherein the first vector comprises a first portion corresponding to a first frame in a video and a second portion corresponding to a second frame in the video, wherein the human face is more blurry in the first frame compared to the second frame. 18. The computer-implemented method of claim 17 , wherein the processing combines the first portion and the second portion to reconstruct an image in the second sequence of images with the human face by using the first portion to control coarse scale styles and the second portion to control fine scale styles. 19. The computer-implemented method of claim 1 , wherein the human face captured in the first image is blurry and the processing reconstructs an image in the second sequence of images with the human face by using the first vector to control coarse scale styles and the data to control fine scale styles. 20. The computer-implemented method of claim 1 , further comprising displaying the second sequence of images of the human face in a viewing environment, wherein the neural network reconstructs the second sequence of images according to a gaze location within each image in the second sequence of images that is intersected by a gaze direction of a viewer observing the image as sensed in the viewing environment. 21. The computer-implemented method of claim 20 , wherein a gaze direction of the human face in each image in the second sequence of images is modified to appear directed towards the gaze location. 22. The computer-implemented method of claim 1 , wherein the first vector includes a gaze location corresponding to an image viewed by the human face and a gaze direction of a reconstructed image of the human face in a viewing environment is towards the image that is also reconstructed and displayed in the viewing environment. 23. The computer-implemented method of claim 1 , wherein the steps of obtaining, receiving, and processing are performed on a virtual machine comprising a portion of a graphics processing unit. 24. The computer-implemented method of claim 1 , wherein the first sequence of images or the second sequence of images is used for training, testing, or certifying a neural network employed in a machine, robot, or autonomous vehicle. 25. A system, comprising a processor configured to: obtain data for replicating a style specific to a subject, wherein the data is determined by training a generator neural network to produce images of the subject that are compared with captured images of the subject; and implement a neural network that is configured to: apply the data to a sequence of vectors generated for a human face captured in a first sequence of images to modify at least one attribute according to the style specific to the subject; receive, through a communication network, at least a first vector in the sequence of vectors, wherein the first vector encodes attributes of the human face captured in a first image in the first sequence of images; and process the sequence of vectors to reconstruct a second sequence of images of the human face including the at least one attribute that is modified based on the style. 26. A non-transitory, computer-readable storage medium storing instructions that, when executed by a processing unit, cause the processing unit to: obtain data for replicating a style specific to a subject, wherein the data is determined by training a generator neural network to produce images of the subject that are compared with captured images of the subject; and implement a neural network that is configured to: apply the data to multiple vectors to mo

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Generative networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Adversarial learning · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11580395B2 cover?
A latent code defined in an input space is processed by the mapping neural network to produce an intermediate latent code defined in an intermediate latent space. The intermediate latent code may be used as appearance vector that is processed by the synthesis neural network to generate an image. The appearance vector is a compressed encoding of data, such as video frames including a person's fa…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).