Method for searching and device thereof
US-2019392009-A1 · Dec 26, 2019 · US
US11782979B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11782979-B2 |
| Application number | US-202017114922-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 8, 2020 |
| Priority date | Dec 30, 2019 |
| Publication date | Oct 10, 2023 |
| Grant date | Oct 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the disclosure provide methods and apparatuses for video searches and methods and apparatuses for index construction. In one embodiment, the method comprises: upon receiving a search request input by a user to search for a target video, processing, based on a pre-configured algorithm, multimodal search data for the target video included in the search request; providing a processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: receiving a search request input by a user to search for a target video, the search request including multimodal search data for the target video; obtaining a processing result, utilizing a pre-configured algorithm, based on the multimodal search data, the processing result comprising one or more semantic labels for the multimodal search data generated by the pre-configured algorithm; and using the processing result to search a corresponding pre-constructed index, the pre-constructed index based on the processing result to obtain the target video. 2. The method of claim 1 , the multimodal search data comprising text data and the obtaining a processing result of the multimodal search data comprising: processing the text data based on a pre-configured text algorithm to obtain a semantic text label of the text data. 3. The method of claim 1 , the multimodal search data comprising image data and the obtaining a processing result of the multimodal search data comprising: processing the image data based on a pre-configured image algorithm to obtain a semantic image label of the image data; and processing the image data based on a pre-configured vectorization model to obtain a vectorized description of the image data. 4. The method of claim 1 , the multimodal search data comprising video data, the method further comprising: processing the video data into video metadata and video stream data; and segmenting the video stream data into a sequence of video frames based on a pre-configured segmenting manner. 5. The method of claim 4 , the obtaining a processing result of the multimodal search data based on the multimodal search data comprising: processing the video metadata based on a text algorithm to obtain a semantic text label of the video metadata; processing video frames in the sequence of video frames based on a pre-configured video algorithm to obtain semantic video labels of the video frames; and processing the video frames based on a vectorization model to obtain vectorized descriptions of the video frames. 6. The method of claim 2 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising providing a semantic text label of text data with regard to a corresponding pre-constructed inverted index to search to obtain the target video. 7. The method of claim 3 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising: providing the semantic image label with regard to a corresponding pre-constructed inverted index to search to obtain a first initial video; providing the vectorized description of the image data with regard to a corresponding pre-constructed vector index to search to obtain a second initial video; and obtaining the target video based on the first initial video and the second initial video. 8. The method of claim 5 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising: providing the semantic text label of the video metadata with regard to a corresponding pre-constructed inverted index to search to obtain a third initial video; providing the vectorized descriptions of the video frames with regard to a corresponding pre-constructed vector index to search to obtain a fourth initial video; and obtaining the target video based on the third initial video and the fourth initial video. 9. The method of claim 7 , the providing the semantic image label with regard to obtain a first initial video comprising: combining the semantic image label and the semantic text label of the text data to generate a combined label; and providing the combined label with regard to the corresponding pre-constructed inverted index to search to obtain the first initial video. 10. The method of claim 8 , the providing the semantic text label of the video metadata with regard to a corresponding pre-constructed inverted index to search to obtain a third initial video comprising: combining the semantic text label of the video metadata, the semantic text label of the text data, and the semantic video labels of the video frames to generate a combined label; and providing the combined label with regard to the corresponding pre-constructed inverted index to search to obtain the third initial video. 11. The method of claim 1 , the index being constructed by: obtaining video data; processing the video data into video metadata and video stream data; processing the video metadata based on a pre-configured text algorithm to obtain a text processing result of the video metadata; processing the video stream data based on a pre-configured video algorithm and vectorization model, respectively, to obtain a video processing result and a vectorization processing result of the video stream data; constructing an inverted index based on the text processing result and the video processing result; and constructing a vector index based on the vectorization processing result. 12. An apparatus comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic comprising: logic, executed by the processor, for receiving a search request input by a user to search for a target video, the search request including multimodal search data for the target video, logic, executed by the processor, for obtaining a processing result, utilizing a pre-configured algorithm, based on the multimodal search data, the processing result comprising one or more semantic labels for the multimodal search data generated by the pre-configured algorithm; and logic, executed by the processor, for using the processing result to search a corresponding pre-constructed index, the pre-constructed index based on the processing result to obtain the target video. 13. The apparatus of claim 12 , the multimodal search data comprising text data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the text data based on a pre-configured text algorithm to obtain a semantic text label of the text data. 14. The apparatus of claim 12 , the multimodal search data image data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the image data based on a pre-configured image algorithm to obtain a semantic image label of the image data, and logic, executed by the processor, for processing the image data based on a pre-configured vectorization model to obtain a vectorized description of the image data. 15. The apparatus of claim 12 , the multimodal search data video data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the video data into video metadata and video stream data, and logic, executed by the processor, for segmenting the video stream data into a sequence of video frames based on a pre-configured segmenting manner. 16. The apparatus of claim 15 , the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the video metadata based on a text algorithm to obtain a semantic text label of the video metadata, logic
Querying · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Semantic analysis · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.