Model training method and apparatus, computer device, and storage medium

US2024256601A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024256601-A1
Application numberUS-202418603068-A
CountryUS
Kind codeA1
Filing dateMar 12, 2024
Priority dateSep 1, 2022
Publication dateAug 1, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this application disclose a video content retrieval method performed by a computer device. The method includes: obtaining a query text; performing feature extraction processing on the query text through a video content retrieval model, to obtain a plurality of text content features at different feature granularities; calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity; and determining, based on the similarities at different feature granularities, a video content retrieval result corresponding to the query text. The solution may improve model training for the content retrieval model and improve content retrieval precision of the content retrieval model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A video content retrieval method performed by a computer device, the method comprising: obtaining a query text; performing feature extraction processing on the query text through a video content retrieval model, to obtain a plurality of text content features at different feature granularities; calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity; and determining, based on the similarities at different feature granularities, a video content retrieval result corresponding to the query text. 2 . The content retrieval method according to claim 1 , wherein the calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity comprises: determining a quantified retrieval result feature corresponding to each feature granularity of the candidate video content retrieval result; and calculating a similarity between the text content feature and the quantified retrieval result feature according to the feature granularity, to obtain the similarity at the corresponding feature granularity. 3 . The method according claim 1 , wherein the video content retrieval model is trained by: obtaining sample query text and sample video content retrieval result that matches the sample query text; performing, through a video content retrieval model, feature extraction processing on the sample query text and the sample video content retrieval result, to obtain feature information of a plurality of feature granularities, the feature information comprising a query text content feature and a video content retrieval result content feature corresponding to a respective feature granularity; performing, through the content retrieval model, feature quantification processing on the query text content feature and the video content retrieval result content feature, to obtain quantified feature information of each feature granularity; calculating, based on the feature information and the quantified feature information, a retrieval semantic loss corresponding to each feature granularity; and performing model training on the video content retrieval model based on the retrieval semantic loss corresponding to each feature granularity. 4 . The method according to claim 3 , wherein the calculating, based on the feature information and the quantified feature information, a retrieval semantic loss corresponding to each feature granularity comprises: calculating, based on the feature information and the quantified feature information, a first retrieval semantic loss of each feature granularity in a first semantic retrieval direction, and a second retrieval semantic loss of each feature granularity in a second semantic retrieval direction; performing loss aggregation processing on the first retrieval semantic loss and the second retrieval semantic loss; and determining, according to a processing result of the loss aggregation processing, the retrieval semantic loss corresponding to each feature granularity. 5 . The method according to claim 3 , wherein the content retrieval model comprises a feature quantification module corresponding to each feature granularity, and each feature quantification module is configured to perform feature quantification processing on feature information of the feature granularity corresponding to each feature quantification module. 6 . The method according to claim 3 , wherein the feature granularity comprises a coarse granularity, and the content retrieval model comprises a first coarse-grained feature extraction module for the sample video content retrieval result; and the performing, through a video content retrieval model, feature extraction processing on the sample video content retrieval result comprises: obtaining a modal content feature corresponding to at least one content mode of the sample video content retrieval result; separately performing, through the first coarse-grained feature extraction module, feature encoding processing on the modal content feature based on a self-attention mechanism, to obtain an encoded feature corresponding to each content mode; and performing feature aggregation processing on the encoded feature corresponding to each content mode, to obtain a coarse-grained video content retrieval result content feature of the sample video content retrieval result. 7 . The method according to claim 3 , wherein the feature granularity comprises a fine granularity, and the content retrieval model comprises a fine-grained feature extraction module shared by the sample query text and the sample video content retrieval result; and the performing, through a video content retrieval model, feature extraction processing on the sample query text and the sample video content retrieval result, to obtain feature information of a plurality of feature granularities comprises: obtaining a plurality of content features of the sample query text and the sample video content retrieval result; performing feature clustering processing on the content features through the fine-grained feature extraction module; and determining feature information of a plurality of fine granularities based on a clustering result of the feature clustering processing, wherein the feature information comprises the query text content feature and the video content retrieval result content feature. 8 . The method according to claim 3 , wherein the performing model training on the video content retrieval model based on the retrieval semantic loss corresponding to each feature granularity comprises: performing, according to a granularity type of the feature granularity, loss aggregation processing on the retrieval semantic loss corresponding to each feature granularity; and performing model training on the content retrieval model based on an aggregated retrieval semantic loss. 9 . A computer device, comprising a memory and a processor, the memory storing a plurality of instructions, and the processor being configured to execute the plurality of instructions in the memory and cause the computer device to perform a video content retrieval method including: obtaining a query text; performing feature extraction processing on the query text through a video content retrieval model, to obtain a plurality of text content features at different feature granularities; calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity; and determining, based on the similarities at different feature granularities, a video content retrieval result corresponding to the query text. 10 . The computer device according to claim 9 , wherein the calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity comprises: determining a quantified retrieval result feature corresponding to each feature granularity of the candidate video content retrieval result; and calculating a similarity between the text content feature and the quantified retrieval result feature according to the feature granularity, to obtain the similarity at the corresponding feature granularity. 11 . The computer device according to claim 9 , wherein the video content retrieval model is trained by: obtaining sample query text and sample video content retrieva

Assignees

Inventors

Classifications

  • Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate (end-user interface involving hot spots associated with the video H04N21/4725; end-user interface for selecting a Region of Interest H04N21/4728) · CPC title

  • Querying (for retrieval from the web G06F16/953) · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Query formulation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024256601A1 cover?
Embodiments of this application disclose a video content retrieval method performed by a computer device. The method includes: obtaining a query text; performing feature extraction processing on the query text through a video content retrieval model, to obtain a plurality of text content features at different feature granularities; calculating, based on the text content feature of each feature …
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/7335. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 01 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).