Automatic detection of objects in video images
US-2017185872-A1 · Jun 29, 2017 · US
US11587344B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11587344-B2 |
| Application number | US-201916403278-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 3, 2019 |
| Priority date | Nov 4, 2016 |
| Publication date | Feb 21, 2023 |
| Grant date | Feb 21, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises receiving a plurality of observations characterizing a particular video, each observation comprising a video frame from the particular video and data identifying a time stamp of the video frame in the particular video. In yet another aspect, the method comprises receiving a plurality of observations characterizing a particular image, each observation comprising a crop of the particular image and data characterizing the crop of the particular image. The method processes each of the plurality of observations using an observation neural network to determine a numeric representation as output.
Opening claim text (preview).
The invention claimed is: 1. A computer implemented method comprising: receiving a plurality of observations characterizing a particular scene, each observation comprising: (i) an image of the particular scene, and (ii) data identifying a location of a camera that captured the image; processing each of the plurality of observations using an observation neural network, wherein the observation neural network is configured to, for each of the observations: process the observation to generate as output a lower-dimensional representation of the observation; determining a numeric representation of the particular scene by combining the lower-dimension representations of the observations; providing the numeric representation of the particular scene for use in characterizing contents of the particular scene; receiving data identifying a new camera location; and processing: (i) the data identifying the new camera location, and (ii) the numeric representation of the particular scene, using a generator neural network to generate a new image of the particular scene taken from a camera at the new camera location. 2. The method of claim 1 , wherein the numeric representation is a collection of numeric values that represents underlying contents of the particular scene. 3. The method of claim 1 , wherein the numeric representation is a semantic description of the particular scene. 4. The method of claim 1 , wherein combining the lower-dimension representations of the observations comprises: summing the lower-dimension representations to generate the numeric representation. 5. The method of claim 1 , wherein the generator neural network is configured to: at each of a plurality of time steps: sample one or more latent variables for the time step, and update a hidden state as of the time step by processing the hidden state, the sampled latent variables, the numeric representation, and the data identifying the new camera location using a deep convolutional neural network to generate an updated hidden state; and after a last time step in the plurality of time steps: generate the new image of the particular scene from the updated hidden state after the last time step. 6. The method of claim 5 , wherein generating the new image of the particular scene from the updated hidden state after the last time step comprises: generating pixel sufficient statistics from the updated hidden state after the last time step; and sampling color values of pixels in the new image using the pixel sufficient statistics. 7. The method of claim 1 , wherein the observation neural network has been trained to generate numeric representations that, in combination with a particular camera location, is usable by a generator neural network to generate a reconstruction of a particular image of the particular scene taken from the particular camera location. 8. A computer implemented method comprising: receiving a plurality of observations characterizing a particular video, each observation comprising: (i) a video frame from the particular video and, (ii) data identifying a time stamp of the video frame in the particular video; processing each of the plurality of observations using an observation neural network, wherein the observation neural network is configured to, for each of the observations: process the observation to generate as output a lower-dimensional representation of the observation; determining a numeric representation of the particular video by combining the lower-dimension representations of the observations; providing the numeric representation of the particular video for use in characterizing contents of the particular video; receiving data identifying a new time stamp; and processing: (i) the data identifying the new time stamp, and (ii) the numeric representation of the particular video, using a generator neural network to generate a new video frame at the new time stamp in the particular video. 9. The method of claim 8 , wherein the numeric representation is a collection of numeric values that represents underlying contents of the particular video. 10. The method of claim 8 , wherein the numeric representation is a semantic description of the particular video. 11. The method of claim 8 , wherein combining the lower-dimension representations of the observations comprises: summing the lower-dimension representations to generate the numeric representation. 12. The method claim 8 , wherein the generator neural network is configured to: at each of a plurality of time steps: sample one or more latent variables for the time step, and update a hidden state as of the time step by processing the hidden state, the sampled latent variables, the numeric representation, and the data identifying the new time stamp using a deep convolutional neural network to generate an updated hidden state; and after a last time step in the plurality of time steps: generate the new video frame from the updated hidden state after the last time step. 13. The method of claim 12 , wherein generating the new video frame comprises: generating pixel sufficient statistics from the updated hidden state after the last time step; and sampling color values of pixels in the new video frame using the pixel sufficient statistics. 14. The method claim 8 , wherein the observation neural network has been trained to generate numeric representations that, in combination with a particular time stamp, is usable by a generator neural network to generate a reconstruction of a particular video frame from the particular video at the particular time stamp. 15. A computer implemented method comprising: receiving a plurality of observations characterizing a particular image, each observation comprising: (i) a crop of the particular image, and (ii) data identifying a location and size of the crop in the particular image; processing each of the plurality of observations using an observation neural network, wherein the observation neural network is configured to, for each of the observations: process the observation to generate as output a lower-dimensional representation of the observation; determining a numeric representation of the particular image by combining the lower-dimension representations of the observations; providing the numeric representation of the particular image for use in characterizing contents of the particular image; receiving data identifying a new crop location and a new crop size; and processing: (i) the data identifying the new crop location and the new crop size, and (ii) the numeric representation of the particular image, using a generator neural network to generate a new crop of the particular image at the new crop location and having the new crop size. 16. The method of claim 15 , wherein the numeric representation is a collection of numeric values that represents underlying contents of the particular image. 17. The method of claim 15 , wherein the numeric representation is a semantic description of the particular image. 18. The method of claim 15 , wherein combining the lower-dimension representations of the observations comprises: summing the lower-dimension representations to generate the numeric representation. 19. The method of claim 15 , wherein the generator neural network is configured to: at each of a plurality of time steps: sample one or more latent variables for the time step, and update a hidden state as of the time step by processing the hidden state, the sampled latent variables, the numeric representation, and the data ide
Recurrent networks, e.g. Hopfield networks · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.