What technology area does this patent fall under?

Primary CPC classification G06T11/60. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, electronic device, and computer program product for video processing

US12494010B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12494010-B2
Application number	US-202318339341-A
Country	US
Kind code	B2
Filing date	Jun 22, 2023
Priority date	Aug 24, 2022
Publication date	Dec 9, 2025
Grant date	Dec 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, an electronic device, and a computer program product for video processing are provided in embodiments of the present disclosure. The method generates an avatar image using a reference image and image data for a first frame in a video stream, and generates an avatar video using the avatar image and image data, audio data, and text data in the video stream. Through this solution, a user-defined avatar video adapted to a user of a real video and actions thereof can be generated more accurately and with high quality.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for video processing, comprising: acquiring a video stream, the video stream comprising image data, audio data, and text data corresponding to video frames, and the video frames comprising a first frame; generating a first avatar image using a reference image and image data for the first frame; obtaining a video integration feature based on the first avatar image, the image data, the audio data, and the text data; and generating an avatar video corresponding to the video stream based on the first avatar image and the video integration feature; wherein obtaining the video integration feature based on the first avatar image, the image data, the audio data, and the text data comprises: converting respective features corresponding to the first avatar image, the image data, the audio data, and the text data to respective vectors in a feature space; and obtaining the video integration feature based on the respective vectors in the feature space converted from the respective features corresponding to the first avatar image, the image data, the audio data, and the text data. 2 . The method according to claim 1 , wherein obtaining the video integration feature based on the first avatar image, the image data, the audio data, and the text data comprises: obtaining a first avatar image feature, an image difference feature, an audio feature, and a text feature, wherein the first avatar image feature corresponds to the first avatar image, the image difference feature corresponds to image data of adjacent frames in the video frames, the audio feature corresponds to the audio data, and the text feature corresponds to the text data; and performing integration processing on the first avatar image feature, the image difference feature, the audio feature, and the text feature to obtain the video integration feature. 3 . The method according to claim 2 , wherein performing integration processing on the first avatar image feature, the image difference feature, the audio feature, and the text feature to obtain the video integration feature comprises: converting the first avatar image feature, the image difference feature, the audio feature, and the text feature into a first vector, a second vector, a third vector, and a fourth vector in the feature space, respectively; generating a feature integration vector based on the first vector, the second vector, the third vector, and the fourth vector; generating a residual vector corresponding to the feature integration vector by using an attention mechanism; and obtaining the video integration feature based on the feature integration vector and the residual vector. 4 . The method according to claim 1 , wherein the method is implemented by an avatar video generation model. 5 . The method according to claim 4 , further comprising: obtaining a first loss function based on the avatar video, the audio data, and the text data; and training the avatar video generation model by using the first loss function. 6 . The method according to claim 5 , wherein obtaining a first loss function based on the avatar video, the audio data, and the text data comprises: obtaining a video-audio loss function based on the avatar video and the audio data; obtaining a video-text loss function based on the avatar video and the text data; obtaining an audio-text loss function based on the audio data and the text data; and obtaining the first loss function based on the video-audio loss function, the video-text loss function, and the audio-text loss function. 7 . The method according to claim 6 , wherein the video frames further comprise a second frame, and the method further comprises: obtaining a second loss function based on the first avatar image and a second avatar image for the second frame; and training the avatar video generation model by using the second loss function. 8 . The method according to claim 7 , further comprising: obtaining a third loss function based on the image data and the avatar video; and training the avatar video generation model by using the third loss function. 9 . The method according to claim 8 , wherein the method further comprises: obtaining a fourth loss function based on the first loss function, the second loss function, and the third loss function; and training the avatar video generation model by using the fourth loss function. 10 . An electronic device, comprising: at least one processor; and at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the electronic device to perform operations comprising: acquiring a video stream, the video stream comprising image data, audio data, and text data corresponding to video frames, and the video frames comprising a first frame; generating a first avatar image using a reference image and image data for the first frame; obtaining a video integration feature based on the first avatar image, the image data, the audio data, and the text data; and generating an avatar video corresponding to the video stream based on the first avatar image and the video integration feature; wherein obtaining the video integration feature based on the first avatar image, the image data, the audio data, and the text data comprises: converting respective features corresponding to the first avatar image, the image data, the audio data, and the text data to respective vectors in a feature space; and obtaining the video integration feature based on the respective vectors in the feature space converted from the respective features corresponding to the first avatar image, the image data, the audio data, and the text data. 11 . The electronic device according to claim 10 , wherein obtaining the video integration feature based on the first avatar image, the image data, the audio data, and the text data comprises: obtaining a first avatar image feature, an image difference feature, an audio feature, and a text feature, wherein the first avatar image feature corresponds to the first avatar image, the image difference feature corresponds to image data of adjacent frames in the video frames, the audio feature corresponds to the audio data, and the text feature corresponds to the text data; and performing integration processing on the first avatar image feature, the image difference feature, the audio feature, and the text feature to obtain the video integration feature. 12 . The electronic device according to claim 11 , wherein performing integration processing on the first avatar image feature, the image difference feature, the audio feature, and the text feature to obtain the video integration feature comprises: converting the first avatar image feature, the image difference feature, the audio feature, and the text feature into a first vector, a second vector, a third vector, and a fourth vector in the feature space, respectively; generating a feature integration vector based on the first vector, the second vector, the third vector, and the fourth vector; generating a residual vector corresponding to the feature integration vector by using an attention mechanism; and obtaining the video integration feature based on the feature integration vector and the residual vector. 13 . The electronic device according to claim 10 , wherein the operations are implemented by an avatar video generation model. 14 . The electronic device according to claim 13 , wherein the operations further comprise: obtaining a first loss function based on the avatar video, the audio data, and the text data; and

Assignees

Dell Products Lp

Inventors

Classifications

G06T13/40
of characters, e.g. humans, animals or virtual beings · CPC title
G06T11/60Primary
Creating or editing images; Combining images with text · CPC title
G06N3/08
Learning methods · CPC title
G06N3/045
Combinations of networks · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 89996860

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12494010B2 cover?: A method, an electronic device, and a computer program product for video processing are provided in embodiments of the present disclosure. The method generates an avatar image using a reference image and image data for a first frame in a video stream, and generates an avatar video using the avatar image and image data, audio data, and text data in the video stream. Through this solution, a user…
Who is the assignee on this patent?: Dell Products Lp
What technology area does this patent fall under?: Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).