What technology area does this patent fall under?

Primary CPC classification G06V10/764. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Model training method, media information synthesis method, and related apparatuses

US12283087B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12283087-B2
Application number	US-202017109072-A
Country	US
Kind code	B2
Filing date	Dec 1, 2020
Priority date	Nov 19, 2019
Publication date	Apr 22, 2025
Grant date	Apr 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A model training method includes obtaining an image sample set and brief-prompt information; generating a content mask set according to the image sample set and the brief-prompt information; generating a to-be-trained image set according to the content mask set; obtaining, based on the image sample set and the to-be-trained image set, a predicted image set through a to-be-trained information synthesis model, the predicted image set comprising at least one predicted image, the predicted image being in correspondence to the image sample; and training, based on the predicted image set and the image sample set, the to-be-trained information synthesis model by using a target loss function, to obtain an information synthesis model.

First claim

Opening claim text (preview).

What is claimed is: 1. A model training method, the method comprising: obtaining an image sample set and brief-prompt information, the image sample set comprising at least one image sample, the brief-prompt information representing key-point information of a to-be-trained object in the at least one image sample, wherein the at least one image sample includes a plurality of consecutive image samples, and the plurality of consecutive image samples are used for forming a video sample; generating a content mask set according to the image sample set and the brief-prompt information, the content mask set comprising at least one content mask, the at least one content mask being obtained by extending outward a region identified according to the brief-prompt information in the at least one image sample; generating a to-be-trained image set according to the content mask set, the to-be-trained image set comprising at least one to-be-trained image, the at least one to-be-trained image being in correspondence to the at least one image sample; obtaining, based on the image sample set and the to-be-trained image set, a predicted image set through a to-be-trained information synthesis model, the predicted image set comprising at least one predicted image, the at least one predicted image being in correspondence to the at least one image sample; and training, based on the predicted image set and the image sample set, the to-be-trained information synthesis model by using a target loss function, to obtain an information synthesis model, comprising: determining a first loss function according to N frames of predicted images in the predicted image set, N frames of to-be-trained images in the to-be-trained image set, and N frames of image samples in the image sample set, N being an integer greater than 1, wherein the first loss function is determined based on an output of a generator of the to-be-trained information synthesis model when inputting a superposition of (N- 1 ) frames of to-be-trained images and an Nth frame of to-be-trained image to the generator; determining a second loss function according to N frames of predicted images in the predicted image set and N frames of image samples in the image sample set; determining the target loss function according to the first loss function and the second loss function; iteratively updating a model parameter of the to-be-trained information synthesis model according to the target loss function; and generating, in a case that an iteration end condition is satisfied, the information synthesis model according to the model parameter of the to-be-trained information synthesis model. 2. The method according to claim 1 , wherein: the to-be-trained object is a human body object; the obtaining an image sample set and brief-prompt information comprises: obtaining the image sample set; and obtaining the brief-prompt information corresponding to the at least one image sample in the image sample set by using a human body pose estimator method; and the generating a content mask set according to the image sample set and the brief-prompt information comprises: generating, based on the at least one image sample in the image sample set and according to the brief-prompt information corresponding to the to-be-trained object, a human body key-point image; generating, based on the human body key-point image corresponding to the at least one image sample in the image sample set, a human body skeleton connection image; and generating, based on the human body skeleton connection image corresponding to the at least one image sample in the image sample set, a human body content mask by using a convex hull algorithm, the human body content mask belonging to the con at least one tent mask. 3. The method according to claim 2 , wherein the generating a to-be-trained image set according to the content mask set comprises: covering, based on the human body content mask in the content mask set, the human body content mask on the at least one image sample, and filling the to-be-trained object back to the at least one image sample, to obtain the at least one to-be-trained image in the to-be-trained image set. 4. The method according to claim 1 , wherein the generating a content mask set according to the image sample set and the brief-prompt information comprises: generating, based on the at least one image sample in the image sample set and according to the brief-prompt information corresponding to the to-be-trained object, K target human face key-points, each of the K target human face key-points being in correspondence to a human face key-point in the brief-prompt information, K being an integer greater than 1; generating, based on the K target human face key-points of the at least one image sample in the image sample set, an original human face content mask by using a convex hull algorithm; generating, based on the original human face content mask of the at least one image sample in the image sample set, an expanded human face content mask according to a mask expansion proportion, the expanded human face content mask belonging to the at least one content mask; and generating, based on the original human face content mask of the at least one image sample in the image sample set, a contracted human face content mask according to a mask contraction proportion, the contracted human face content mask belonging to the at least one content mask. 5. The method according to claim 4 , wherein the generating a to-be-trained image set according to the content mask set comprises: covering the expanded human face content mask on a target image sample of the at least one image sample, to obtain a first mask image, wherein a region corresponding to the expanded human face content mask in the target image sample is set to a blank region; extracting image content of a region corresponding to the contracted human face content mask in the target image sample, to obtain a second mask image; and generating, by filling the second mask image into the blank region in the first mask image, one of the at least one to-be-trained image corresponding to the target image sample. 6. The method according to claim 1 , wherein the determining the target loss function according to the first loss function and the second loss function comprises: calculating the target loss function in the following manner: L ( G,D )= E f,r [L r ( G )+λ s L s ( G,D )]; L r ( G )=∥ m ⊗( f−G ( r ))∥ 1 ; L s ( G,D )=log( D ( r,f ))+log(1− D ( r,G ( r ))); wherein L(G,D) represents the target loss function, E represents an expected value calculation, L r (G) represents the first loss function, L s (G,D) represents the second loss function, G( ) represents the generator in the to-be-trained information synthesis model, D( ) represents a discriminator in the to-be-trained information synthesis model, λ s represents a first preset coefficient, O represents the (N−1) frames of the to-be-trained images, ƒ represents an N th frame of image sample, r represents an N th frame of to-be-trained image, m represents a content mask of the N th frame ⊗ of to-be-trained image, & represents a per-pixel multiplication, and ⊕ represents the superposition of image frames. 7. The method according to claim 1 , wherein: the training, based on the predicted image set and the image sample set, the to-be-trained information synthesis model by using a target loss function, to obtain an information synthesis model further comprises: determining a third loss function according to M frames of predicted images in the predicted image set and M frames of image samples in the image sample set, M being an integer greater than or equal to 1 and less than or equal to N; and determining the target loss func

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G06N20/00
Machine learning · CPC title
G06V10/7747
Organisation of the process, e.g. bagging or boosting · CPC title
G06F18/2148
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
G06F18/217
Validation; Performance evaluation; Active pattern learning techniques · CPC title
G06V40/171
Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title

Patent family

Related publications grouped by family.

View patent family 75909757

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12283087B2 cover?: A model training method includes obtaining an image sample set and brief-prompt information; generating a content mask set according to the image sample set and the brief-prompt information; generating a to-be-trained image set according to the content mask set; obtaining, based on the image sample set and the to-be-trained image set, a predicted image set through a to-be-trained information sy…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V10/764. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Music driven human dancing video synthesis

Human action data set generation in a machine learning system

Image composites using a generative adversarial neural network

Image compression/decompression method and device, and image processing system

Training method and detection method for object recognition

Method and apparatus for processing video image and electronic device

Method for image segmentation

Frequently asked questions