Scene understanding and generation using neural networks

US11587344B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11587344-B2
Application numberUS-201916403278-A
CountryUS
Kind codeB2
Filing dateMay 3, 2019
Priority dateNov 4, 2016
Publication dateFeb 21, 2023
Grant dateFeb 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises receiving a plurality of observations characterizing a particular video, each observation comprising a video frame from the particular video and data identifying a time stamp of the video frame in the particular video. In yet another aspect, the method comprises receiving a plurality of observations characterizing a particular image, each observation comprising a crop of the particular image and data characterizing the crop of the particular image. The method processes each of the plurality of observations using an observation neural network to determine a numeric representation as output.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer implemented method comprising: receiving a plurality of observations characterizing a particular scene, each observation comprising: (i) an image of the particular scene, and (ii) data identifying a location of a camera that captured the image; processing each of the plurality of observations using an observation neural network, wherein the observation neural network is configured to, for each of the observations: process the observation to generate as output a lower-dimensional representation of the observation; determining a numeric representation of the particular scene by combining the lower-dimension representations of the observations; providing the numeric representation of the particular scene for use in characterizing contents of the particular scene; receiving data identifying a new camera location; and processing: (i) the data identifying the new camera location, and (ii) the numeric representation of the particular scene, using a generator neural network to generate a new image of the particular scene taken from a camera at the new camera location. 2. The method of claim 1 , wherein the numeric representation is a collection of numeric values that represents underlying contents of the particular scene. 3. The method of claim 1 , wherein the numeric representation is a semantic description of the particular scene. 4. The method of claim 1 , wherein combining the lower-dimension representations of the observations comprises: summing the lower-dimension representations to generate the numeric representation. 5. The method of claim 1 , wherein the generator neural network is configured to: at each of a plurality of time steps: sample one or more latent variables for the time step, and update a hidden state as of the time step by processing the hidden state, the sampled latent variables, the numeric representation, and the data identifying the new camera location using a deep convolutional neural network to generate an updated hidden state; and after a last time step in the plurality of time steps: generate the new image of the particular scene from the updated hidden state after the last time step. 6. The method of claim 5 , wherein generating the new image of the particular scene from the updated hidden state after the last time step comprises: generating pixel sufficient statistics from the updated hidden state after the last time step; and sampling color values of pixels in the new image using the pixel sufficient statistics. 7. The method of claim 1 , wherein the observation neural network has been trained to generate numeric representations that, in combination with a particular camera location, is usable by a generator neural network to generate a reconstruction of a particular image of the particular scene taken from the particular camera location. 8. A computer implemented method comprising: receiving a plurality of observations characterizing a particular video, each observation comprising: (i) a video frame from the particular video and, (ii) data identifying a time stamp of the video frame in the particular video; processing each of the plurality of observations using an observation neural network, wherein the observation neural network is configured to, for each of the observations: process the observation to generate as output a lower-dimensional representation of the observation; determining a numeric representation of the particular video by combining the lower-dimension representations of the observations; providing the numeric representation of the particular video for use in characterizing contents of the particular video; receiving data identifying a new time stamp; and processing: (i) the data identifying the new time stamp, and (ii) the numeric representation of the particular video, using a generator neural network to generate a new video frame at the new time stamp in the particular video. 9. The method of claim 8 , wherein the numeric representation is a collection of numeric values that represents underlying contents of the particular video. 10. The method of claim 8 , wherein the numeric representation is a semantic description of the particular video. 11. The method of claim 8 , wherein combining the lower-dimension representations of the observations comprises: summing the lower-dimension representations to generate the numeric representation. 12. The method claim 8 , wherein the generator neural network is configured to: at each of a plurality of time steps: sample one or more latent variables for the time step, and update a hidden state as of the time step by processing the hidden state, the sampled latent variables, the numeric representation, and the data identifying the new time stamp using a deep convolutional neural network to generate an updated hidden state; and after a last time step in the plurality of time steps: generate the new video frame from the updated hidden state after the last time step. 13. The method of claim 12 , wherein generating the new video frame comprises: generating pixel sufficient statistics from the updated hidden state after the last time step; and sampling color values of pixels in the new video frame using the pixel sufficient statistics. 14. The method claim 8 , wherein the observation neural network has been trained to generate numeric representations that, in combination with a particular time stamp, is usable by a generator neural network to generate a reconstruction of a particular video frame from the particular video at the particular time stamp. 15. A computer implemented method comprising: receiving a plurality of observations characterizing a particular image, each observation comprising: (i) a crop of the particular image, and (ii) data identifying a location and size of the crop in the particular image; processing each of the plurality of observations using an observation neural network, wherein the observation neural network is configured to, for each of the observations: process the observation to generate as output a lower-dimensional representation of the observation; determining a numeric representation of the particular image by combining the lower-dimension representations of the observations; providing the numeric representation of the particular image for use in characterizing contents of the particular image; receiving data identifying a new crop location and a new crop size; and processing: (i) the data identifying the new crop location and the new crop size, and (ii) the numeric representation of the particular image, using a generator neural network to generate a new crop of the particular image at the new crop location and having the new crop size. 16. The method of claim 15 , wherein the numeric representation is a collection of numeric values that represents underlying contents of the particular image. 17. The method of claim 15 , wherein the numeric representation is a semantic description of the particular image. 18. The method of claim 15 , wherein combining the lower-dimension representations of the observations comprises: summing the lower-dimension representations to generate the numeric representation. 19. The method of claim 15 , wherein the generator neural network is configured to: at each of a plurality of time steps: sample one or more latent variables for the time step, and update a hidden state as of the time step by processing the hidden state, the sampled latent variables, the numeric representation, and the data ide

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11587344B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises …
Who is the assignee on this patent?
Deepmind Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06V20/41. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).