Systems and methods to optimize video streaming using digital avatars
US-12167169-B1 · Dec 10, 2024 · US
US12568288B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12568288-B2 |
| Application number | US-202418584210-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 22, 2024 |
| Priority date | Oct 30, 2023 |
| Publication date | Mar 3, 2026 |
| Grant date | Mar 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods include generating synthetic videos based on a custom motion. A video generation system obtains a text prompt including an object and a custom motion token. The custom motion token represents a custom motion. The system encodes the text prompt to obtain a text embedding. Subsequently, a video generation model generates a synthetic video depicting the object performing the custom motion based on the text embedding using a video generation model.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: obtaining a text prompt including an object and a custom motion token, wherein the custom motion token represents a custom motion; encoding the text prompt to obtain a text embedding, wherein the text embedding represents the custom motion in an embedding space; and generating, using a video generation model, a synthetic video by denoising image features based on the text embedding, wherein the synthetic video depicts the object performing the custom motion across a plurality of frames, and wherein the plurality of frames includes a first frame depicting the object with a first pose of the custom motion and a second frame depicting the object with a second pose of the custom motion different from the first pose. 2 . The method of claim 1 , wherein: the text prompt includes a custom object token representing a custom object; and the synthetic video depicts the custom object performing the custom motion. 3 . The method of claim 1 , wherein: the text prompt includes a nonce character representing the custom motion. 4 . The method of claim 1 , further comprising: optimizing the custom motion token using the video generation model. 5 . The method of claim 1 , further comprising: modifying the custom motion token based on the text prompt to obtain a contextualized custom motion token, wherein the synthetic video is based on the contextualized custom motion token. 6 . The method of claim 1 , wherein: the video generation model is trained using a training set including a plurality of videos depicting a plurality of entities performing the custom motion, respectively. 7 . The method of claim 6 , wherein: the training set includes a plurality of videos depicting plurality of entities having a custom appearance, respectively, and wherein the custom appearance corresponds to a custom object token. 8 . A method of training a machine learning model, the method comprising: obtaining a training set including a plurality of video clips depicting a plurality of different entities performing a custom motion, respectively; and training, using the training set, a video generation model to generate synthetic videos by denoising image features based on a text embedding that represents the custom motion in an embedding space, wherein the synthetic videos depict an arbitrary entity performing the custom motion based on a custom motion token in a text prompt, wherein the custom motion is performed across a plurality of frames, and wherein the plurality of frames includes a first frame depicting the object with a first pose of the custom motion and a second frame depicting the object with a second pose of the custom motion different from the first pose. 9 . The method of claim 8 , wherein training the video generation model comprises: training the video generation model to generate synthetic videos depicting an entity having a custom appearance based on a custom object token in the text prompt. 10 . The method of claim 8 , wherein obtaining the training set comprises: augmenting the training set by generating one or more additional video clips based on the plurality of video clips. 11 . The method of claim 8 , wherein obtaining the training set comprises: augmenting the training set by generating one or more captions for each of the plurality of video clips, respectively, wherein each of the one or more captions includes the custom motion token. 12 . The method of claim 8 , wherein training the video generation model comprises: tuning the video generation model using a reconstruction loss and a regularization loss, wherein the regularization loss reduces an association between the custom motion and one or more objects performing the custom motion the plurality of video clips. 13 . The method of claim 8 , wherein training the video generation model comprises: fixing one or more appearance layers and updating one or more temporal layers of the video generation model. 14 . An apparatus comprising: at least one processor; at least one memory storing instructions executable by the at least one processor; and a video generation model comprising parameters stored in the at least one memory and trained to generate a synthetic video by denoising image features based on a text embedding that represents a custom motion in an embedding space, wherein the synthetic video depicts an arbitrary entity performing the custom motion, where the custom motion is identified by a custom motion token within a text prompt, wherein the custom motion is performed across a plurality of frames, and wherein the plurality of frames includes a first frame depicting the object with a first pose of the custom motion and a second frame depicting the object with a second pose of the custom motion different from the first pose. 15 . The apparatus of claim 14 , further comprising: a user interface configured to obtain the text prompt. 16 . The apparatus of claim 14 , wherein: the video generation model comprises a diffusion model. 17 . The apparatus of claim 14 , wherein: the video generation model comprises a transformer model. 18 . The apparatus of claim 14 , further comprising: a text encoder configured to encode the text prompt to obtain a text embedding. 19 . The apparatus of claim 14 , further comprising: an embedding optimizer configured to contextualize the custom motion token. 20 . The apparatus of claim 14 , further comprising: a training component configured to train the video generation model based on training data.
for displaying subtitles · CPC title
of characters, e.g. humans, animals or virtual beings · CPC title
Animation · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
involving special video data, e.g 3D video · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.