Automatic detection of objects in video images
US-2017185872-A1 · Jun 29, 2017 · US
US12481893B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12481893-B2 |
| Application number | US-202318164021-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 3, 2023 |
| Priority date | Nov 4, 2016 |
| Publication date | Nov 25, 2025 |
| Grant date | Nov 25, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises receiving a plurality of observations characterizing a particular video, each observation comprising a video frame from the particular video and data identifying a time stamp of the video frame in the particular video. In yet another aspect, the method comprises receiving a plurality of observations characterizing a particular image, each observation comprising a crop of the particular image and data characterizing the crop of the particular image. The method processes each of the plurality of observations using an observation neural network to determine a numeric representation as output.
Opening claim text (preview).
The invention claimed is: 1 . A method performed by one or more computers for generating a numerical representation of a scene, the method comprising: receiving a plurality of observations characterizing the scene, each observation comprising: (i) an image of the scene, and (ii) data identifying a location of a camera that captured the image; processing each of the plurality of observations using an observation neural network to generate a respective lower dimensional representation of each of the plurality of observations; generating the numerical representation of the scene by combining the respective lower dimensional representations of each of the plurality of observations; and providing the numerical representation of the scene for use in characterizing content of the scene, wherein the observation neural network has been jointly trained with a generator neural network that is configured to: receive data identifying a new camera location; process a network input to the generator neural network, the network input comprising: (i) the data identifying the new camera location, and (ii) the numerical representation of the scene; and generate, by the generator neural network, a network output that comprises a new image of the scene that represents the view of the scene from the camera at the new camera location. 2 . The method of claim 1 , wherein generating the numerical representation of the scene by combining the lower dimensional representations of the plurality of observations comprises: summing the lower dimensional representations of the plurality of observations. 3 . The method of claim 1 , wherein generating the numerical representation of the scene by combining the lower dimensional representations of the plurality of observations comprises: processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the scene. 4 . The method of claim 3 , wherein processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the scene comprises: processing the respective lower dimensional representation corresponding to each of the plurality of observations using a recurrent neural network; wherein numerical representation of the scene is based on a hidden state of the recurrent neural network after the recurrent neural network has processed the respective lower dimensional representation corresponding to each of the plurality of observations. 5 . The method of claim 1 , wherein for one or more of the plurality of observations, the data identifying the location of the camera that captured the observation defines a three-dimensional position, yaw, and pitch of the camera. 6 . The method of claim 1 , wherein the observation neural network comprises one or more convolutional neural network layers. 7 . The method of claim 1 , wherein providing the numerical representation of the scene for use in characterizing the content of the scene comprises: computationally rendering the new image of the scene that represents the view of the scene from the camera at the new camera location using the numerical representation of the scene. 8 . The method of claim 7 , wherein computationally rendering the new image of the scene that represents a view of the scene from the camera at the new camera location using the numerical representation of the scene comprises: receiving data identifying the new camera location; processing, using the generator neural network, a network input to the generator neural network, the network input comprising: (i) the data identifying the new camera location, and (ii) the numerical representation of the scene; and generating, by the generator neural network and in response to processing the network input, a network output that comprises the new image of the scene that represents the view of the scene from the camera at the new camera location. 9 . The method of claim 8 , wherein processing the network input using the generator neural network to generate the network output comprises: at each of a plurality of time steps: sampling one or more latent variables for the time step; and updating a hidden state of the generator neural network as of the time step by processing the hidden state, the sampled latent variables, the numerical representation of the scene, and the data identifying the new camera location; and after a last time step in the plurality of time steps: generating the new image of the scene from the updated hidden state of the generator neural network. 10 . The method of claim 8 , wherein processing the network input using the generator neural network to generate the network output comprises: processing the network input, using the generator neural network, to generate respective pixel sufficient statistics for each pixel in the new image of the scene; and sampling a respective color value for each pixel in the new image of the scene using the pixel sufficient statistics for the pixel. 11 . The method of claim 8 , wherein the generator neural network and the observation neural network have been trained jointly with a posterior neural network that is configured to, during the training, receive a plurality of training observations and a target observation and generate a posterior output that defines a distribution over one or more latent variables. 12 . A method performed by one or more computers for generating a numerical representation of a video, the method comprising: receiving a plurality of observations characterizing the video, each observation comprising: (i) a video frame of the video, and (ii) data identifying a time stamp of the video frame of the video; processing each of the plurality of observations using an observation neural network to generate a respective lower dimensional representation of each of the plurality of observations; generating the numerical representation of the video by combining the respective lower dimensional representations of each of the plurality of observations; and providing the numerical representation of the video for use in characterizing the video, wherein the observation neural network has been jointly trained with a generator neural network that is configured to: receive data identifying a new time stamp; process a network input to the generator neural network, the network input comprising: (i) the data identifying the new time stamp, and (ii) the numerical representation of the video; and generate, by the generator neural network, a network output that comprises a new video frame at the new time stamp. 13 . The method of claim 12 , wherein generating the numerical representation of the video by combining the lower dimensional representations of the plurality of observations comprises: summing the lower dimensional representations of the plurality of observations. 14 . The method of claim 12 , wherein generating the numerical representation of the video by combining the lower dimensional representations of the plurality of observations comprises: processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the video. 15 . The method of claim 14 , wherein processing the lower dimensional representations of the plurality of observations using one or more neural network layers to generate the numerical representation of the video comprises: processing the respective lower dimension
Recurrent networks, e.g. Hopfield networks · CPC title
using neural networks · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Scenes; Scene-specific elements (control of digital cameras H04N23/60) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.