Personalized output generation in generative artificial intelligence models

US2025356171A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025356171-A1
Application numberUS-202418959042-A
CountryUS
Kind codeA1
Filing dateNov 25, 2024
Priority dateMay 14, 2024
Publication dateNov 20, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques and apparatus for generating visual content according to a textual prompt input into a generative artificial intelligence model. An example method generally includes receiving an input prompt specifying a video output to be generated by a generative artificial intelligence model. Based on a spatial portion of the generative artificial intelligence model and a cross-attention map generated based on the input prompt, a spatial attention map representing a subject of the video output to be generated by the generative artificial intelligence model is generated. Based on a temporal portion of the generative artificial intelligence model and the cross-attention map, a temporal attention map representing motion to be depicted by the subject of the video output to be generated by the generative artificial intelligence model is generated. The video output is generated based on the spatial attention map and the temporal attention map, and the generated video output is output.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processing system in a device, comprising: a memory configured to store parameters for a generative artificial intelligence model; and one or more processors, coupled to the memory, configured to: receive an input prompt specifying a video output to be generated by the generative artificial intelligence model; generate, based on a spatial portion of the generative artificial intelligence model and an output of a spatial cross-attention block generated based on the input prompt, a spatial attention map representing a subject of the video output to be generated by the generative artificial intelligence model; generate, based on a temporal portion of the generative artificial intelligence model and the output of the spatial cross-attention block, a temporal attention map representing motion to be depicted by the subject of the video output to be generated by the generative artificial intelligence model, the output of the spatial cross-attention block being applied as a mask to intermediate outputs generated within the temporal portion of the generative artificial intelligence model and used to generate the temporal attention map; generate the video output based on the spatial attention map and the temporal attention map; and output the generated video output. 2 . The processing system of claim 1 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross-attention block generated by the spatial cross-attention block during a previous inferencing round performed using the generative artificial intelligence model; and a spatial feedforward network configured to generate a second spatial map based on the output of the spatial cross-attention block, the prior output of the spatial cross-attention block, and the spatial-domain adaptation block. 3 . The processing system of claim 2 , wherein the spatial-domain adaptation block comprises a first spatial adapter for an appearance of the subject and a second spatial adapter for motion of the subject. 4 . The processing system of claim 1 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross-attention block generated by the spatial cross-attention block during a previous inferencing round performed using the generative artificial intelligence model applied as a mask to an input into the spatial-domain adaptation block; and a spatial feedforward network configured to generate a second spatial map based on the output of the spatial cross-attention block generated by the spatial cross-attention block, the prior output of the spatial cross-attention block, and the spatial-domain adaptation block, wherein the input into the spatial-domain adaptation block comprises the output of the spatial cross-attention block masked based on the prior output of the spatial cross-attention block. 5 . The processing system of claim 1 , wherein the temporal portion of the generative artificial intelligence model comprises: a temporal self-attention block configured to generate a first temporal map based on a time-domain adaptation block, a prior output of the spatial cross-attention block, and the input prompt; and a temporal feedforward network configured to generate a second temporal map based on the first temporal map, the prior output of the spatial cross-attention block, and the time-domain adaptation block. 6 . The processing system of claim 1 , wherein the spatial portion of the generative artificial intelligence model is configured to customize an appearance of the subject of the video output independently of motion performed by the subject of the video output. 7 . The processing system of claim 1 , wherein the temporal portion of the generative artificial intelligence model comprises a time-domain adaptation block for motion of the subject. 8 . The processing system of claim 1 , wherein background content in the generated video output is different from background content in images in a training data set used to train the generative artificial intelligence model depicting one of the subject of the video output or the motion of the subject. 9 . The processing system of claim 1 , further comprising a display configured to display the generated video output. 10 . A processor-implemented method for machine learning, comprising: receiving an input prompt specifying a video output to be generated by a generative artificial intelligence model; generating, based on a spatial portion of the generative artificial intelligence model and an output of a spatial cross-attention block generated based on the input prompt, a spatial attention map representing a subject of the video output to be generated by the generative artificial intelligence model; generating, based on a temporal portion of the generative artificial intelligence model and the output of the spatial cross-attention block, a temporal attention map representing motion to be depicted by the subject of the video output to be generated by the generative artificial intelligence model, the output of the spatial cross-attention block being applied as a mask to intermediate outputs generated within the temporal portion of the generative artificial intelligence model and used to generate the temporal attention map; generating the video output based on the spatial attention map and the temporal attention map; and outputting the generated video output. 11 . The method of claim 10 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross-attention block generated by the spatial cross-attention block during a previous inferencing round performed using the generative artificial intelligence model; and a spatial feedforward network configured to generate a second spatial map based on the output of the spatial cross-attention block generated by the spatial cross-attention block, the prior output of the spatial cross-attention block, and the spatial-domain adaptation block. 12 . The method of claim 11 , wherein the spatial-domain adaptation block comprises a first spatial adapter for an appearance of the subject and a second spatial adapter for motion of the subject. 13 . The method of claim 10 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • G06N3/0475Primary

    Generative networks · CPC title

  • Combinations of networks · CPC title

  • Content authoring · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025356171A1 cover?
Techniques and apparatus for generating visual content according to a textual prompt input into a generative artificial intelligence model. An example method generally includes receiving an input prompt specifying a video output to be generated by a generative artificial intelligence model. Based on a spatial portion of the generative artificial intelligence model and a cross-attention map gene…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/0475. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).