Digital content matching system
US-2024412259-A1 · Dec 12, 2024 · US
US2025356171A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025356171-A1 |
| Application number | US-202418959042-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 25, 2024 |
| Priority date | May 14, 2024 |
| Publication date | Nov 20, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques and apparatus for generating visual content according to a textual prompt input into a generative artificial intelligence model. An example method generally includes receiving an input prompt specifying a video output to be generated by a generative artificial intelligence model. Based on a spatial portion of the generative artificial intelligence model and a cross-attention map generated based on the input prompt, a spatial attention map representing a subject of the video output to be generated by the generative artificial intelligence model is generated. Based on a temporal portion of the generative artificial intelligence model and the cross-attention map, a temporal attention map representing motion to be depicted by the subject of the video output to be generated by the generative artificial intelligence model is generated. The video output is generated based on the spatial attention map and the temporal attention map, and the generated video output is output.
Opening claim text (preview).
What is claimed is: 1 . A processing system in a device, comprising: a memory configured to store parameters for a generative artificial intelligence model; and one or more processors, coupled to the memory, configured to: receive an input prompt specifying a video output to be generated by the generative artificial intelligence model; generate, based on a spatial portion of the generative artificial intelligence model and an output of a spatial cross-attention block generated based on the input prompt, a spatial attention map representing a subject of the video output to be generated by the generative artificial intelligence model; generate, based on a temporal portion of the generative artificial intelligence model and the output of the spatial cross-attention block, a temporal attention map representing motion to be depicted by the subject of the video output to be generated by the generative artificial intelligence model, the output of the spatial cross-attention block being applied as a mask to intermediate outputs generated within the temporal portion of the generative artificial intelligence model and used to generate the temporal attention map; generate the video output based on the spatial attention map and the temporal attention map; and output the generated video output. 2 . The processing system of claim 1 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross-attention block generated by the spatial cross-attention block during a previous inferencing round performed using the generative artificial intelligence model; and a spatial feedforward network configured to generate a second spatial map based on the output of the spatial cross-attention block, the prior output of the spatial cross-attention block, and the spatial-domain adaptation block. 3 . The processing system of claim 2 , wherein the spatial-domain adaptation block comprises a first spatial adapter for an appearance of the subject and a second spatial adapter for motion of the subject. 4 . The processing system of claim 1 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross-attention block generated by the spatial cross-attention block during a previous inferencing round performed using the generative artificial intelligence model applied as a mask to an input into the spatial-domain adaptation block; and a spatial feedforward network configured to generate a second spatial map based on the output of the spatial cross-attention block generated by the spatial cross-attention block, the prior output of the spatial cross-attention block, and the spatial-domain adaptation block, wherein the input into the spatial-domain adaptation block comprises the output of the spatial cross-attention block masked based on the prior output of the spatial cross-attention block. 5 . The processing system of claim 1 , wherein the temporal portion of the generative artificial intelligence model comprises: a temporal self-attention block configured to generate a first temporal map based on a time-domain adaptation block, a prior output of the spatial cross-attention block, and the input prompt; and a temporal feedforward network configured to generate a second temporal map based on the first temporal map, the prior output of the spatial cross-attention block, and the time-domain adaptation block. 6 . The processing system of claim 1 , wherein the spatial portion of the generative artificial intelligence model is configured to customize an appearance of the subject of the video output independently of motion performed by the subject of the video output. 7 . The processing system of claim 1 , wherein the temporal portion of the generative artificial intelligence model comprises a time-domain adaptation block for motion of the subject. 8 . The processing system of claim 1 , wherein background content in the generated video output is different from background content in images in a training data set used to train the generative artificial intelligence model depicting one of the subject of the video output or the motion of the subject. 9 . The processing system of claim 1 , further comprising a display configured to display the generated video output. 10 . A processor-implemented method for machine learning, comprising: receiving an input prompt specifying a video output to be generated by a generative artificial intelligence model; generating, based on a spatial portion of the generative artificial intelligence model and an output of a spatial cross-attention block generated based on the input prompt, a spatial attention map representing a subject of the video output to be generated by the generative artificial intelligence model; generating, based on a temporal portion of the generative artificial intelligence model and the output of the spatial cross-attention block, a temporal attention map representing motion to be depicted by the subject of the video output to be generated by the generative artificial intelligence model, the output of the spatial cross-attention block being applied as a mask to intermediate outputs generated within the temporal portion of the generative artificial intelligence model and used to generate the temporal attention map; generating the video output based on the spatial attention map and the temporal attention map; and outputting the generated video output. 11 . The method of claim 10 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross-attention block generated by the spatial cross-attention block during a previous inferencing round performed using the generative artificial intelligence model; and a spatial feedforward network configured to generate a second spatial map based on the output of the spatial cross-attention block generated by the spatial cross-attention block, the prior output of the spatial cross-attention block, and the spatial-domain adaptation block. 12 . The method of claim 11 , wherein the spatial-domain adaptation block comprises a first spatial adapter for an appearance of the subject and a second spatial adapter for motion of the subject. 13 . The method of claim 10 , wherein: the spatial cross-attention block is configured to generate the output of the spatial cross-attention block based on the input prompt and the first spatial map; and the spatial portion of the generative artificial intelligence model comprises: a spatial self-attention block configured to generate a first spatial map based on the input prompt, a spatial-domain adaptation block, and a prior output of the spatial cross
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Generative networks · CPC title
Combinations of networks · CPC title
Content authoring · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.