Method and apparatus for video searches and index construction

US11782979B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11782979-B2
Application numberUS-202017114922-A
CountryUS
Kind codeB2
Filing dateDec 8, 2020
Priority dateDec 30, 2019
Publication dateOct 10, 2023
Grant dateOct 10, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the disclosure provide methods and apparatuses for video searches and methods and apparatuses for index construction. In one embodiment, the method comprises: upon receiving a search request input by a user to search for a target video, processing, based on a pre-configured algorithm, multimodal search data for the target video included in the search request; providing a processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving a search request input by a user to search for a target video, the search request including multimodal search data for the target video; obtaining a processing result, utilizing a pre-configured algorithm, based on the multimodal search data, the processing result comprising one or more semantic labels for the multimodal search data generated by the pre-configured algorithm; and using the processing result to search a corresponding pre-constructed index, the pre-constructed index based on the processing result to obtain the target video. 2. The method of claim 1 , the multimodal search data comprising text data and the obtaining a processing result of the multimodal search data comprising: processing the text data based on a pre-configured text algorithm to obtain a semantic text label of the text data. 3. The method of claim 1 , the multimodal search data comprising image data and the obtaining a processing result of the multimodal search data comprising: processing the image data based on a pre-configured image algorithm to obtain a semantic image label of the image data; and processing the image data based on a pre-configured vectorization model to obtain a vectorized description of the image data. 4. The method of claim 1 , the multimodal search data comprising video data, the method further comprising: processing the video data into video metadata and video stream data; and segmenting the video stream data into a sequence of video frames based on a pre-configured segmenting manner. 5. The method of claim 4 , the obtaining a processing result of the multimodal search data based on the multimodal search data comprising: processing the video metadata based on a text algorithm to obtain a semantic text label of the video metadata; processing video frames in the sequence of video frames based on a pre-configured video algorithm to obtain semantic video labels of the video frames; and processing the video frames based on a vectorization model to obtain vectorized descriptions of the video frames. 6. The method of claim 2 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising providing a semantic text label of text data with regard to a corresponding pre-constructed inverted index to search to obtain the target video. 7. The method of claim 3 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising: providing the semantic image label with regard to a corresponding pre-constructed inverted index to search to obtain a first initial video; providing the vectorized description of the image data with regard to a corresponding pre-constructed vector index to search to obtain a second initial video; and obtaining the target video based on the first initial video and the second initial video. 8. The method of claim 5 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising: providing the semantic text label of the video metadata with regard to a corresponding pre-constructed inverted index to search to obtain a third initial video; providing the vectorized descriptions of the video frames with regard to a corresponding pre-constructed vector index to search to obtain a fourth initial video; and obtaining the target video based on the third initial video and the fourth initial video. 9. The method of claim 7 , the providing the semantic image label with regard to obtain a first initial video comprising: combining the semantic image label and the semantic text label of the text data to generate a combined label; and providing the combined label with regard to the corresponding pre-constructed inverted index to search to obtain the first initial video. 10. The method of claim 8 , the providing the semantic text label of the video metadata with regard to a corresponding pre-constructed inverted index to search to obtain a third initial video comprising: combining the semantic text label of the video metadata, the semantic text label of the text data, and the semantic video labels of the video frames to generate a combined label; and providing the combined label with regard to the corresponding pre-constructed inverted index to search to obtain the third initial video. 11. The method of claim 1 , the index being constructed by: obtaining video data; processing the video data into video metadata and video stream data; processing the video metadata based on a pre-configured text algorithm to obtain a text processing result of the video metadata; processing the video stream data based on a pre-configured video algorithm and vectorization model, respectively, to obtain a video processing result and a vectorization processing result of the video stream data; constructing an inverted index based on the text processing result and the video processing result; and constructing a vector index based on the vectorization processing result. 12. An apparatus comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic comprising: logic, executed by the processor, for receiving a search request input by a user to search for a target video, the search request including multimodal search data for the target video, logic, executed by the processor, for obtaining a processing result, utilizing a pre-configured algorithm, based on the multimodal search data, the processing result comprising one or more semantic labels for the multimodal search data generated by the pre-configured algorithm; and logic, executed by the processor, for using the processing result to search a corresponding pre-constructed index, the pre-constructed index based on the processing result to obtain the target video. 13. The apparatus of claim 12 , the multimodal search data comprising text data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the text data based on a pre-configured text algorithm to obtain a semantic text label of the text data. 14. The apparatus of claim 12 , the multimodal search data image data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the image data based on a pre-configured image algorithm to obtain a semantic image label of the image data, and logic, executed by the processor, for processing the image data based on a pre-configured vectorization model to obtain a vectorized description of the image data. 15. The apparatus of claim 12 , the multimodal search data video data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the video data into video metadata and video stream data, and logic, executed by the processor, for segmenting the video stream data into a sequence of video frames based on a pre-configured segmenting manner. 16. The apparatus of claim 15 , the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the video metadata based on a text algorithm to obtain a semantic text label of the video metadata, logic

Assignees

Inventors

Classifications

  • G06F16/73Primary

    Querying · CPC title

  • G06F16/71Primary

    Indexing; Data structures therefor; Storage structures · CPC title

  • Semantic analysis · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11782979B2 cover?
Embodiments of the disclosure provide methods and apparatuses for video searches and methods and apparatuses for index construction. In one embodiment, the method comprises: upon receiving a search request input by a user to search for a target video, processing, based on a pre-configured algorithm, multimodal search data for the target video included in the search request; providing a processi…
Who is the assignee on this patent?
Alibaba Group Holding Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/73. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).