What technology area does this patent fall under?

Primary CPC classification H04N7/0135. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Video prediction using one or more neural networks

US11902705B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11902705-B2
Application number	US-201916558620-A
Country	US
Kind code	B2
Filing date	Sep 3, 2019
Priority date	Sep 3, 2019
Publication date	Feb 13, 2024
Grant date	Feb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: one or more circuits to use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 2. The processor of claim 1 , wherein video frames are used to train one or more neural networks and to generate one or more temporal pose representations and a time-invariant appearance representation. 3. The processor of claim 2 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the one or more temporal pose representations. 4. The processor of claim 2 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps and parameterized as a mean and a covariance. 5. The processor of claim 2 , wherein the one or more temporal pose representations and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 6. The processor of claim 1 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 7. The processor of claim 1 , wherein the generated objects are used to generate a video with a higher frame rate, fewer dropped frames, or additional content. 8. A system comprising: one or more processors to use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks; and one or more memories. 9. The system of claim 8 , wherein video frames are used to train one or more neural networks and to generate a temporal pose representation and a time-invariant appearance representation. 10. The system of claim 9 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 11. The system of claim 9 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 12. The system of claim 9 , wherein the temporal pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 13. The system of claim 8 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 14. A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 15. The non-transitory machine-readable medium of claim 14 , wherein video frames are used to train one or more neural networks and to generate a pose representation and a time-invariant appearance representation. 16. The non-transitory machine-readable medium of claim 15 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for temporal pose representations. 17. The non-transitory machine-readable medium of claim 15 , wherein one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 18. The non-transitory machine-readable medium of claim 15 , wherein the pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 19. The non-transitory machine-readable medium of claim 14 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 20. A processor comprising: one or more circuits to train one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 21. The processor of claim 20 , wherein video frames are used to train the one or more neural networks and to generate a temporal pose representation and a time invariant appearance representation. 22. The processor of claim 21 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 23. The processor of claim 21 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 24. The processor of claim 21 , wherein the temporal pose representation and a time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 25. The processor of claim 20 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 26. A system comprising: one or more processors to calculate parameters corresponding to one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks; and one or more memories to store the parameters. 27. The system of claim 26 , wherein video frames are used to train the one or more neural networks and to generate a temporal pose representation and a time-invariant appearance representation. 28. The system of claim 27 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 29. The system of claim 27 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 30. The system of claim 27 , wherein the pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one

Assignees

Nvidia Corp

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

View patent family 72291146

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11902705B2 cover?: Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification H04N7/0135. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Efficient human pose tracking in videos

Deep learning processing of video

Systems and methods for dynamic facial analysis using a recurrent neural network

Frequently asked questions