Systems and methods for automated movie generation and editing

US2026099978A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026099978-A1
Application numberUS-202519348764-A
CountryUS
Kind codeA1
Filing dateOct 2, 2025
Priority dateOct 3, 2024
Publication dateApr 9, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method to generate a video includes receiving an input describing a scene. The method also includes receiving a reference image depicting a character. The method further includes generating, via an encoder, embeddings of identity features of the reference image. The method also includes generating, via a video generation model, the video in which the character appears with consistent likeness across multiple frames in accordance with the embeddings and the text prompt.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method to generate a video, the method comprising: receiving an input describing a scene; receiving a reference image depicting a character; generating, via an encoder, embeddings of identity features of the reference image; and generating, via a video generation model, the video in which the character appears with consistent likeness across multiple frames in accordance with the embeddings and the text prompt. 2 . The method of claim 1 , further comprising: generating, via a transformer, a joint multimodal embedding sequence based on concatenating the embeddings with text prompt embeddings associated with the text prompt. 3 . The method of claim 2 , further comprising: projecting the embeddings into a common latent space dimension of the video generation model prior to the concatenation with the text prompt embeddings. 4 . The method of claim 2 , wherein the embeddings are concatenated with the text prompt embeddings via a learned gating mechanism that dynamically weights identity features relative to textual features. 5 . The method of claim 1 , wherein the embeddings are injected into a cross-attention layer of the video generation model to condition hidden representations derived from the text prompt. 6 . The method of claim 1 , further comprising: generating multiple scenes with different inputs while maintaining the consistent likeness of the character across all scenes. 7 . The method of claim 1 , wherein maintaining the consistent likeness comprises preserving one or more of facial expressions, hairstyle, clothing, or other distinguishing features of the reference image. 8 . An apparatus to generate a video, the apparatus comprising: one or more processors; and one or more memories coupled with the one or more processors and storing processor-executable code that, when executed by the one or more processors, is configured to cause the apparatus to: receive an input describing a scene; receive a reference image depicting a character; generate, via an encoder, embeddings of identity features of the reference image; and generate, via a video generation model, the video in which the character appears with consistent likeness across multiple frames in accordance with the embeddings and the text prompt. 9 . The apparatus of claim 8 , wherein execution of the processor-executable code further causes the apparatus to generate, via a transformer, a joint multimodal embedding sequence based on concatenating the embeddings with text prompt embeddings associated with the text prompt. 10 . The apparatus of claim 9 , wherein execution of the processor-executable code further causes the apparatus to project the embeddings into a common latent space dimension of the video generation model prior to the concatenation with the text prompt embeddings. 11 . The apparatus of claim 9 , wherein the embeddings are concatenated with the text prompt embeddings via a learned gating mechanism that dynamically weights identity features relative to textual features. 12 . The apparatus of claim 8 , wherein the embeddings are injected into a cross-attention layer of the video generation model to condition hidden representations derived from the text prompt. 13 . The apparatus of claim 8 , wherein execution of the processor-executable code further causes the apparatus to generate multiple scenes with different inputs while maintaining the consistent likeness of the character across all scenes. 14 . The apparatus of claim 8 , wherein execution of the processor-executable code that causes the apparatus to maintain the consistent likeness further causes the apparatus to preserve one or more of facial expressions, hairstyle, clothing, or other distinguishing features of the reference image. 15 . A non-transitory computer-readable medium having program code recorded thereon to generate a video, the program code executed by one or more processors and comprising: program code to receive an input describing a scene; program code to receive a reference image depicting a character; program code to generate, via an encoder, embeddings of identity features of the reference image; and program code to generate, via a video generation model, the video in which the character appears with consistent likeness across multiple frames in accordance with the embeddings and the text prompt. 16 . The non-transitory computer-readable medium of claim 15 , wherein the program code further comprises program code to generate, via a transformer, a joint multimodal embedding sequence based on concatenating the embeddings with text prompt embeddings associated with the text prompt. 17 . The non-transitory computer-readable medium of claim 16 , wherein the program code further comprises program code to project the embeddings into a common latent space dimension of the video generation model prior to the concatenation with the text prompt embeddings. 18 . The non-transitory computer-readable medium of claim 16 , wherein the embeddings are concatenated with the text prompt embeddings via a learned gating mechanism that dynamically weights identity features relative to textual features. 19 . The non-transitory computer-readable medium of claim 15 , wherein the embeddings are injected into a cross-attention layer of the video generation model to condition hidden representations derived from the text prompt. 20 . The non-transitory computer-readable medium of claim 15 , wherein the program code further comprises program code to generate multiple scenes with different inputs while maintaining the consistent likeness of the character across all scenes.

Assignees

Inventors

Classifications

  • G06T13/40Primary

    of characters, e.g. humans, animals or virtual beings · CPC title

  • Creating or editing images; Combining images with text · CPC title

  • using neural networks · CPC title

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • involving special video data, e.g 3D video · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026099978A1 cover?
A method to generate a video includes receiving an input describing a scene. The method also includes receiving a reference image depicting a character. The method further includes generating, via an encoder, embeddings of identity features of the reference image. The method also includes generating, via a video generation model, the video in which the character appears with consistent likeness…
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06T13/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).