What technology area does this patent fall under?

Primary CPC classification H04N21/816. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Customizing motion and appearance in video generation

US12568288B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12568288-B2
Application number	US-202418584210-A
Country	US
Kind code	B2
Filing date	Feb 22, 2024
Priority date	Oct 30, 2023
Publication date	Mar 3, 2026
Grant date	Mar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods include generating synthetic videos based on a custom motion. A video generation system obtains a text prompt including an object and a custom motion token. The custom motion token represents a custom motion. The system encodes the text prompt to obtain a text embedding. Subsequently, a video generation model generates a synthetic video depicting the object performing the custom motion based on the text embedding using a video generation model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: obtaining a text prompt including an object and a custom motion token, wherein the custom motion token represents a custom motion; encoding the text prompt to obtain a text embedding, wherein the text embedding represents the custom motion in an embedding space; and generating, using a video generation model, a synthetic video by denoising image features based on the text embedding, wherein the synthetic video depicts the object performing the custom motion across a plurality of frames, and wherein the plurality of frames includes a first frame depicting the object with a first pose of the custom motion and a second frame depicting the object with a second pose of the custom motion different from the first pose. 2 . The method of claim 1 , wherein: the text prompt includes a custom object token representing a custom object; and the synthetic video depicts the custom object performing the custom motion. 3 . The method of claim 1 , wherein: the text prompt includes a nonce character representing the custom motion. 4 . The method of claim 1 , further comprising: optimizing the custom motion token using the video generation model. 5 . The method of claim 1 , further comprising: modifying the custom motion token based on the text prompt to obtain a contextualized custom motion token, wherein the synthetic video is based on the contextualized custom motion token. 6 . The method of claim 1 , wherein: the video generation model is trained using a training set including a plurality of videos depicting a plurality of entities performing the custom motion, respectively. 7 . The method of claim 6 , wherein: the training set includes a plurality of videos depicting plurality of entities having a custom appearance, respectively, and wherein the custom appearance corresponds to a custom object token. 8 . A method of training a machine learning model, the method comprising: obtaining a training set including a plurality of video clips depicting a plurality of different entities performing a custom motion, respectively; and training, using the training set, a video generation model to generate synthetic videos by denoising image features based on a text embedding that represents the custom motion in an embedding space, wherein the synthetic videos depict an arbitrary entity performing the custom motion based on a custom motion token in a text prompt, wherein the custom motion is performed across a plurality of frames, and wherein the plurality of frames includes a first frame depicting the object with a first pose of the custom motion and a second frame depicting the object with a second pose of the custom motion different from the first pose. 9 . The method of claim 8 , wherein training the video generation model comprises: training the video generation model to generate synthetic videos depicting an entity having a custom appearance based on a custom object token in the text prompt. 10 . The method of claim 8 , wherein obtaining the training set comprises: augmenting the training set by generating one or more additional video clips based on the plurality of video clips. 11 . The method of claim 8 , wherein obtaining the training set comprises: augmenting the training set by generating one or more captions for each of the plurality of video clips, respectively, wherein each of the one or more captions includes the custom motion token. 12 . The method of claim 8 , wherein training the video generation model comprises: tuning the video generation model using a reconstruction loss and a regularization loss, wherein the regularization loss reduces an association between the custom motion and one or more objects performing the custom motion the plurality of video clips. 13 . The method of claim 8 , wherein training the video generation model comprises: fixing one or more appearance layers and updating one or more temporal layers of the video generation model. 14 . An apparatus comprising: at least one processor; at least one memory storing instructions executable by the at least one processor; and a video generation model comprising parameters stored in the at least one memory and trained to generate a synthetic video by denoising image features based on a text embedding that represents a custom motion in an embedding space, wherein the synthetic video depicts an arbitrary entity performing the custom motion, where the custom motion is identified by a custom motion token within a text prompt, wherein the custom motion is performed across a plurality of frames, and wherein the plurality of frames includes a first frame depicting the object with a first pose of the custom motion and a second frame depicting the object with a second pose of the custom motion different from the first pose. 15 . The apparatus of claim 14 , further comprising: a user interface configured to obtain the text prompt. 16 . The apparatus of claim 14 , wherein: the video generation model comprises a diffusion model. 17 . The apparatus of claim 14 , wherein: the video generation model comprises a transformer model. 18 . The apparatus of claim 14 , further comprising: a text encoder configured to encode the text prompt to obtain a text embedding. 19 . The apparatus of claim 14 , further comprising: an embedding optimizer configured to contextualize the custom motion token. 20 . The apparatus of claim 14 , further comprising: a training component configured to train the video generation model based on training data.

Assignees

Adobe Inc

Inventors

Classifications

H04N21/4884
for displaying subtitles · CPC title
G06T13/40
of characters, e.g. humans, animals or virtual beings · CPC title
G06T13/00
Animation · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
H04N21/816Primary
involving special video data, e.g 3D video · CPC title

Patent family

Related publications grouped by family.

View patent family 93014485

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12568288B2 cover?: Systems and methods include generating synthetic videos based on a custom motion. A video generation system obtains a text prompt including an object and a custom motion token. The custom motion token represents a custom motion. The system encodes the text prompt to obtain a text embedding. Subsequently, a video generation model generates a synthetic video depicting the object performing the cu…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification H04N21/816. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).