Display apparatus and method of providing information thereof
US-2015347461-A1 · Dec 3, 2015 · US
US2024256601A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024256601-A1 |
| Application number | US-202418603068-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 12, 2024 |
| Priority date | Sep 1, 2022 |
| Publication date | Aug 1, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of this application disclose a video content retrieval method performed by a computer device. The method includes: obtaining a query text; performing feature extraction processing on the query text through a video content retrieval model, to obtain a plurality of text content features at different feature granularities; calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity; and determining, based on the similarities at different feature granularities, a video content retrieval result corresponding to the query text. The solution may improve model training for the content retrieval model and improve content retrieval precision of the content retrieval model.
Opening claim text (preview).
What is claimed is: 1 . A video content retrieval method performed by a computer device, the method comprising: obtaining a query text; performing feature extraction processing on the query text through a video content retrieval model, to obtain a plurality of text content features at different feature granularities; calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity; and determining, based on the similarities at different feature granularities, a video content retrieval result corresponding to the query text. 2 . The content retrieval method according to claim 1 , wherein the calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity comprises: determining a quantified retrieval result feature corresponding to each feature granularity of the candidate video content retrieval result; and calculating a similarity between the text content feature and the quantified retrieval result feature according to the feature granularity, to obtain the similarity at the corresponding feature granularity. 3 . The method according claim 1 , wherein the video content retrieval model is trained by: obtaining sample query text and sample video content retrieval result that matches the sample query text; performing, through a video content retrieval model, feature extraction processing on the sample query text and the sample video content retrieval result, to obtain feature information of a plurality of feature granularities, the feature information comprising a query text content feature and a video content retrieval result content feature corresponding to a respective feature granularity; performing, through the content retrieval model, feature quantification processing on the query text content feature and the video content retrieval result content feature, to obtain quantified feature information of each feature granularity; calculating, based on the feature information and the quantified feature information, a retrieval semantic loss corresponding to each feature granularity; and performing model training on the video content retrieval model based on the retrieval semantic loss corresponding to each feature granularity. 4 . The method according to claim 3 , wherein the calculating, based on the feature information and the quantified feature information, a retrieval semantic loss corresponding to each feature granularity comprises: calculating, based on the feature information and the quantified feature information, a first retrieval semantic loss of each feature granularity in a first semantic retrieval direction, and a second retrieval semantic loss of each feature granularity in a second semantic retrieval direction; performing loss aggregation processing on the first retrieval semantic loss and the second retrieval semantic loss; and determining, according to a processing result of the loss aggregation processing, the retrieval semantic loss corresponding to each feature granularity. 5 . The method according to claim 3 , wherein the content retrieval model comprises a feature quantification module corresponding to each feature granularity, and each feature quantification module is configured to perform feature quantification processing on feature information of the feature granularity corresponding to each feature quantification module. 6 . The method according to claim 3 , wherein the feature granularity comprises a coarse granularity, and the content retrieval model comprises a first coarse-grained feature extraction module for the sample video content retrieval result; and the performing, through a video content retrieval model, feature extraction processing on the sample video content retrieval result comprises: obtaining a modal content feature corresponding to at least one content mode of the sample video content retrieval result; separately performing, through the first coarse-grained feature extraction module, feature encoding processing on the modal content feature based on a self-attention mechanism, to obtain an encoded feature corresponding to each content mode; and performing feature aggregation processing on the encoded feature corresponding to each content mode, to obtain a coarse-grained video content retrieval result content feature of the sample video content retrieval result. 7 . The method according to claim 3 , wherein the feature granularity comprises a fine granularity, and the content retrieval model comprises a fine-grained feature extraction module shared by the sample query text and the sample video content retrieval result; and the performing, through a video content retrieval model, feature extraction processing on the sample query text and the sample video content retrieval result, to obtain feature information of a plurality of feature granularities comprises: obtaining a plurality of content features of the sample query text and the sample video content retrieval result; performing feature clustering processing on the content features through the fine-grained feature extraction module; and determining feature information of a plurality of fine granularities based on a clustering result of the feature clustering processing, wherein the feature information comprises the query text content feature and the video content retrieval result content feature. 8 . The method according to claim 3 , wherein the performing model training on the video content retrieval model based on the retrieval semantic loss corresponding to each feature granularity comprises: performing, according to a granularity type of the feature granularity, loss aggregation processing on the retrieval semantic loss corresponding to each feature granularity; and performing model training on the content retrieval model based on an aggregated retrieval semantic loss. 9 . A computer device, comprising a memory and a processor, the memory storing a plurality of instructions, and the processor being configured to execute the plurality of instructions in the memory and cause the computer device to perform a video content retrieval method including: obtaining a query text; performing feature extraction processing on the query text through a video content retrieval model, to obtain a plurality of text content features at different feature granularities; calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity; and determining, based on the similarities at different feature granularities, a video content retrieval result corresponding to the query text. 10 . The computer device according to claim 9 , wherein the calculating, based on the text content feature of each feature granularity, a similarity corresponding to the query and a candidate video content retrieval result at the corresponding feature granularity comprises: determining a quantified retrieval result feature corresponding to each feature granularity of the candidate video content retrieval result; and calculating a similarity between the text content feature and the quantified retrieval result feature according to the feature granularity, to obtain the similarity at the corresponding feature granularity. 11 . The computer device according to claim 9 , wherein the video content retrieval model is trained by: obtaining sample query text and sample video content retrieva
Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate (end-user interface involving hot spots associated with the video H04N21/4725; end-user interface for selecting a Region of Interest H04N21/4728) · CPC title
Querying (for retrieval from the web G06F16/953) · CPC title
Matching criteria, e.g. proximity measures · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Query formulation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.