Efficient human pose tracking in videos
US-10861170-B1 · Dec 8, 2020 · US
US11902705B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11902705-B2 |
| Application number | US-201916558620-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 3, 2019 |
| Priority date | Sep 3, 2019 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: one or more circuits to use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 2. The processor of claim 1 , wherein video frames are used to train one or more neural networks and to generate one or more temporal pose representations and a time-invariant appearance representation. 3. The processor of claim 2 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the one or more temporal pose representations. 4. The processor of claim 2 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps and parameterized as a mean and a covariance. 5. The processor of claim 2 , wherein the one or more temporal pose representations and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 6. The processor of claim 1 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 7. The processor of claim 1 , wherein the generated objects are used to generate a video with a higher frame rate, fewer dropped frames, or additional content. 8. A system comprising: one or more processors to use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks; and one or more memories. 9. The system of claim 8 , wherein video frames are used to train one or more neural networks and to generate a temporal pose representation and a time-invariant appearance representation. 10. The system of claim 9 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 11. The system of claim 9 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 12. The system of claim 9 , wherein the temporal pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 13. The system of claim 8 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 14. A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: use one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 15. The non-transitory machine-readable medium of claim 14 , wherein video frames are used to train one or more neural networks and to generate a pose representation and a time-invariant appearance representation. 16. The non-transitory machine-readable medium of claim 15 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for temporal pose representations. 17. The non-transitory machine-readable medium of claim 15 , wherein one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 18. The non-transitory machine-readable medium of claim 15 , wherein the pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 19. The non-transitory machine-readable medium of claim 14 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 20. A processor comprising: one or more circuits to train one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks. 21. The processor of claim 20 , wherein video frames are used to train the one or more neural networks and to generate a temporal pose representation and a time invariant appearance representation. 22. The processor of claim 21 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 23. The processor of claim 21 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 24. The processor of claim 21 , wherein the temporal pose representation and a time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one or more neural networks. 25. The processor of claim 20 , wherein motion of one or more temporal pose representations is modeled over time using a temporal encoder including one or more long short-term memory (LSTM) networks. 26. A system comprising: one or more processors to calculate parameters corresponding to one or more neural networks to generate one or more objects within one or more images having a pose that is based, at least in part, on a pose of one or more objects previously generated using the one or more neural networks; and one or more memories to store the parameters. 27. The system of claim 26 , wherein video frames are used to train the one or more neural networks and to generate a temporal pose representation and a time-invariant appearance representation. 28. The system of claim 27 , wherein color jittering and thin-plate-spline (TPS) warping enforce appearance invariance and localization properties for the temporal pose representations. 29. The system of claim 27 , wherein the one or more temporal pose representations comprise features represented by Gaussian heat maps parameterized as a mean and covariance. 30. The system of claim 27 , wherein the pose representation and the time-invariant appearance representation are used to reconstruct an input video frame, and wherein a loss value resulting from comparing the input video frame with the reconstructed video frame is minimized by adjusting one or more network parameters of the one
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.