Video prediction using one or more neural networks

US11902705B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11902705-B2
Application numberUS-201916558620-A
CountryUS
Kind codeB2
Filing dateSep 3, 2019
Priority dateSep 3, 2019
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: one or more circuits to use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 2. The processor of claim 1 , wherein video frames are used to train one or more neural networks and to generate one or more temporal pose representations and a time-invariant appearance representation. 3. The processor of claim 2 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the one or more temporal pose representations. 4. The processor of claim 2 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps and parameterized as a mean and a covariance. 5. The processor of claim 2 , wherein the one or more temporal pose representations and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 6. The processor of claim 1 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 7. The processor of claim 1 , wherein the generated objects are used to generate a video with a higher frame rate, fewer dropped frames, or additional content. 8. A system comprising: one or more processors to use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks; and one or more memories. 9. The system of claim 8 , wherein video frames are used to train one or more neural networks and to generate a temporal pose representation and a time-invariant appearance representation. 10. The system of claim 9 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 11. The system of claim 9 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 12. The system of claim 9 , wherein the temporal pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 13. The system of claim 8 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 14. A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 15. The non-transitory machine-readable medium of claim 14 , wherein video frames are used to train one or more neural networks and to generate a pose representation and a time-invariant appearance representation. 16. The non-transitory machine-readable medium of claim 15 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for temporal pose representations. 17. The non-transitory machine-readable medium of claim 15 , wherein one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 18. The non-transitory machine-readable medium of claim 15 , wherein the pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 19. The non-transitory machine-readable medium of claim 14 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 20. A processor comprising: one or more circuits to train one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 21. The processor of claim 20 , wherein video frames are used to train the one or more neural networks and to generate a temporal pose representation and a time invariant appearance representation. 22. The processor of claim 21 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 23. The processor of claim 21 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 24. The processor of claim 21 , wherein the temporal pose representation and a time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 25. The processor of claim 20 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 26. A system comprising: one or more processors to calculate parameters corresponding to one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks; and one or more memories to store the parameters. 27. The system of claim 26 , wherein video frames are used to train the one or more neural networks and to generate a temporal pose representation and a time-invariant appearance representation. 28. The system of claim 27 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 29. The system of claim 27 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 30. The system of claim 27 , wherein the pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11902705B2 cover?
Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification H04N7/0135. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).