What technology area does this patent fall under?

Primary CPC classification G06V20/41. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Scene understanding and generation using neural networks

US12481893B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12481893-B2
Application number	US-202318164021-A
Country	US
Kind code	B2
Filing date	Feb 3, 2023
Priority date	Nov 4, 2016
Publication date	Nov 25, 2025
Grant date	Nov 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises receiving a plurality of observations characterizing a particular video, each observation comprising a video frame from the particular video and data identifying a time stamp of the video frame in the particular video. In yet another aspect, the method comprises receiving a plurality of observations characterizing a particular image, each observation comprising a crop of the particular image and data characterizing the crop of the particular image. The method processes each of the plurality of observations using an observation neural network to determine a numeric representation as output.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A method performed by one or more computers for generating a numerical representation of a scene, the method comprising: receiving a plurality of observations characterizing the scene, each observation comprising: (i) an image of the scene, and (ii) data identifying a location of a camera that captured the image; processing each of the plurality of observations using an observation neural network to generate a respective lower dimensional representation of each of the plurality of observations; generating the numerical representation of the scene by combining the respective lower dimensional representations of each of the plurality of observations; and providing the numerical representation of the scene for use in characterizing content of the scene, wherein the observation neural network has been jointly trained with a generator neural network that is configured to: receive data identifying a new camera location; process a network input to the generator neural network, the network input comprising: (i) the data identifying the new camera location, and (ii) the numerical representation of the scene; and generate, by the generator neural network, a network output that comprises a new image of the scene that represents the view of the scene from the camera at the new camera location. 2 . The method of claim 1 , wherein generating the numerical representation of the scene by combining the lower dimensional representations of the plurality of observations comprises: summing the lower dimensional representations of the plurality of observations. 3 . The method of claim 1 , wherein generating the numerical representation of the scene by combining the lower dimensional representations of the plurality of observations comprises: processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the scene. 4 . The method of claim 3 , wherein processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the scene comprises: processing the respective lower dimensional representation corresponding to each of the plurality of observations using a recurrent neural network; wherein numerical representation of the scene is based on a hidden state of the recurrent neural network after the recurrent neural network has processed the respective lower dimensional representation corresponding to each of the plurality of observations. 5 . The method of claim 1 , wherein for one or more of the plurality of observations, the data identifying the location of the camera that captured the observation defines a three-dimensional position, yaw, and pitch of the camera. 6 . The method of claim 1 , wherein the observation neural network comprises one or more convolutional neural network layers. 7 . The method of claim 1 , wherein providing the numerical representation of the scene for use in characterizing the content of the scene comprises: computationally rendering the new image of the scene that represents the view of the scene from the camera at the new camera location using the numerical representation of the scene. 8 . The method of claim 7 , wherein computationally rendering the new image of the scene that represents a view of the scene from the camera at the new camera location using the numerical representation of the scene comprises: receiving data identifying the new camera location; processing, using the generator neural network, a network input to the generator neural network, the network input comprising: (i) the data identifying the new camera location, and (ii) the numerical representation of the scene; and generating, by the generator neural network and in response to processing the network input, a network output that comprises the new image of the scene that represents the view of the scene from the camera at the new camera location. 9 . The method of claim 8 , wherein processing the network input using the generator neural network to generate the network output comprises: at each of a plurality of time steps: sampling one or more latent variables for the time step; and updating a hidden state of the generator neural network as of the time step by processing the hidden state, the sampled latent variables, the numerical representation of the scene, and the data identifying the new camera location; and after a last time step in the plurality of time steps: generating the new image of the scene from the updated hidden state of the generator neural network. 10 . The method of claim 8 , wherein processing the network input using the generator neural network to generate the network output comprises: processing the network input, using the generator neural network, to generate respective pixel sufficient statistics for each pixel in the new image of the scene; and sampling a respective color value for each pixel in the new image of the scene using the pixel sufficient statistics for the pixel. 11 . The method of claim 8 , wherein the generator neural network and the observation neural network have been trained jointly with a posterior neural network that is configured to, during the training, receive a plurality of training observations and a target observation and generate a posterior output that defines a distribution over one or more latent variables. 12 . A method performed by one or more computers for generating a numerical representation of a video, the method comprising: receiving a plurality of observations characterizing the video, each observation comprising: (i) a video frame of the video, and (ii) data identifying a time stamp of the video frame of the video; processing each of the plurality of observations using an observation neural network to generate a respective lower dimensional representation of each of the plurality of observations; generating the numerical representation of the video by combining the respective lower dimensional representations of each of the plurality of observations; and providing the numerical representation of the video for use in characterizing the video, wherein the observation neural network has been jointly trained with a generator neural network that is configured to: receive data identifying a new time stamp; process a network input to the generator neural network, the network input comprising: (i) the data identifying the new time stamp, and (ii) the numerical representation of the video; and generate, by the generator neural network, a network output that comprises a new video frame at the new time stamp. 13 . The method of claim 12 , wherein generating the numerical representation of the video by combining the lower dimensional representations of the plurality of observations comprises: summing the lower dimensional representations of the plurality of observations. 14 . The method of claim 12 , wherein generating the numerical representation of the video by combining the lower dimensional representations of the plurality of observations comprises: processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the video. 15 . The method of claim 14 , wherein processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the video comprises: processing the respective lower dimension

Assignees

Gdm Holding Llc

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06V10/82
using neural networks · CPC title
G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V20/41Primary
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
G06V20/00
Scenes; Scene-specific elements (control of digital cameras H04N23/60) · CPC title

Patent family

Related publications grouped by family.

View patent family 60543605

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12481893B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises …
Who is the assignee on this patent?: Gdm Holding Llc
What technology area does this patent fall under?: Primary CPC classification G06V20/41. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).