Dynamic motion vector referencing for video coding
US-10880573-B2 · Dec 29, 2020 · US
US12488524B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12488524-B2 |
| Application number | US-202117526608-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 15, 2021 |
| Priority date | Nov 15, 2021 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A technique for generating a sequence of geometries includes converting, via an encoder neural network, one or more input geometries corresponding to one or more frames within an animation into one or more latent vectors. The technique also includes generating the sequence of geometries corresponding to a sequence of frames within the animation based on the one or more latent vectors. The technique further includes causing output related to the animation to be generated based on the sequence of geometries.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for generating a sequence of three-dimensional (3D) geometries, the computer-implemented method comprising: converting, via an encoder neural network, one or more input 3D geometries corresponding to one or more frames within an animation into one or more latent vectors; combining (i) a capture code that represents one or more attributes of the sequence of 3D geometries with (ii) a plurality of position encodings that represent a plurality of time steps within the animation to produce a plurality of position-encoded representations of the capture code; generating, via a decoder neural network, the sequence of 3D geometries based on input that includes (i) the one or more latent vectors and (ii) the plurality of position-encoded representations of the capture code, wherein each 3D geometry included in the sequence of 3D geometries corresponds to (i) a different time step included in the plurality of time steps and (ii) a different frame included in a sequence of frames within the animation; and causing output related to the animation to be generated based on the sequence of 3D geometries. 2 . The computer-implemented method of claim 1 , further comprising training the encoder neural network and the decoder neural network based on a training dataset that includes a plurality of sequences of sampled 3D geometries, wherein each sequence of sampled 3D geometries included in the plurality of sequences of sampled 3D geometries comprises a sampled subset of a plurality of 3D geometries corresponding to a plurality of time steps within a geometric representation of one or more movements. 3 . The computer-implemented method of claim 2 , further comprising determining the capture code based on one or more capture codes included in a plurality of capture codes associated with the plurality of sequences of sampled 3D geometries. 4 . The computer-implemented method of claim 3 , wherein determining the capture code comprises at least one of: selecting the capture code from the plurality of capture codes associated with the plurality of sequences of sampled 3D geometries in the training dataset; or interpolating between two or more capture codes included in the plurality of capture codes. 5 . The computer-implemented method of claim 1 , further comprising receiving the one or more input 3D geometries as one or more sets of blendshape weights. 6 . The computer-implemented method of claim 1 , wherein converting the one or more input 3D geometries into the one or more latent vectors comprises: generating one or more input representations based on the one or more input 3D geometries and one or more position encodings that are included in the plurality of position encodings and represent one or more time steps corresponding to of the one or more frames within the animation; and applying a series of one or more encoder blocks included in the encoder neural network to the one or more input representations to generate the one or more latent vectors. 7 . The computer-implemented method of claim 6 , wherein the one or more encoder blocks comprise a self-attention layer, an addition and normalization layer, and a feed-forward layer. 8 . The computer-implemented method of claim 1 , wherein generating the sequence of 3D geometries comprises: generating, via a self-attention layer included in the decoder neural network, a first plurality of outputs based on relative distances between pairs of position-encoded representations of the capture code included in the plurality of position-encoded representations of the capture code; and applying an encoder-decoder attention layer included in the decoder neural network to the first plurality of outputs and the one or more latent vectors to generate a second plurality of outputs; and generating the sequence of 3D geometries based on the second plurality of outputs. 9 . The computer-implemented method of claim 8 , wherein the decoder neural network further comprises an addition and normalization layer and a feed-forward layer. 10 . The computer-implemented method of claim 1 , wherein the animation comprises at least one of a facial performance or a full-body performance. 11 . One or more non-transitory computer readable media storing instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: converting, via an encoder neural network, one or more input three-dimensional (3D) geometries corresponding to one or more frames within an animation into one or more latent vectors; combining (i) a capture code that represents one or more attributes of a sequence of 3D geometries with (ii) a plurality of position encodings that represent a plurality of time steps within the animation to produce a plurality of position-encoded representations of the capture code; generating, via a decoder neural network, sequence of 3D geometries based on input that includes (i) the one or more latent vectors and (ii) h plurality of position-encoded representations of the capture code, wherein each 3D geometry included in the sequence of 3D geometries corresponds to (i) a different time step included in the plurality of time steps and (ii) a different frame included in a sequence of frames within the animation; and causing output related to the animation to be generated based on the sequence of 3D geometries. 12 . The one or more non-transitory computer readable media of claim 11 , wherein the instructions further cause the one or more processors to perform the step of training the encoder neural network and the decoder neural network based on a training dataset that includes a plurality of sequences of sampled 3D geometries and a discriminator neural network, wherein each sequence of sampled 3D geometries included in the plurality of sequences of sampled 3D geometries comprises a plurality of sampled 3D geometries corresponding to a plurality of time steps within a geometric representation of one or more movements. 13 . The one or more non-transitory computer readable media of claim 12 , wherein the instructions further cause the one or more processors to perform the step of determining the capture code based on one or more capture codes included in a plurality of capture codes associated with the plurality of sequences of sampled 3D geometries. 14 . The one or more non-transitory computer readable media of claim 13 , wherein determining the capture code comprises at least one of: selecting the capture code from the plurality of capture codes; or interpolating between two or more capture codes included in the plurality of capture codes. 15 . The one or more non-transitory computer readable media of claim 12 , wherein the encoder neural network and the decoder neural network are included in a transformer neural network. 16 . The one or more non-transitory computer readable media of claim 11 , wherein converting the one or more input 3D geometries into the one or more latent vectors comprises: generating one or more input representations based on a combination of the one or more input 3D geometries with one or more position encodings that are included in the plurality of position encodings and represent one or more time steps corresponding to of the one or more frames within the animation; and applying a series of one or more encoder blocks to the one or more input representations to generate the one or more latent vectors. 17 . The one or more non-transitory computer readable media of claim 11 , wherein generating the sequence of 3D
Combinations of networks · CPC title
Geometric image transformations in the plane of the image · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
using neural networks · CPC title
Artificial neural networks [ANN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.