What technology area does this patent fall under?

Primary CPC classification G06F16/73. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for video searches and index construction

US11782979B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11782979-B2
Application number	US-202017114922-A
Country	US
Kind code	B2
Filing date	Dec 8, 2020
Priority date	Dec 30, 2019
Publication date	Oct 10, 2023
Grant date	Oct 10, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the disclosure provide methods and apparatuses for video searches and methods and apparatuses for index construction. In one embodiment, the method comprises: upon receiving a search request input by a user to search for a target video, processing, based on a pre-configured algorithm, multimodal search data for the target video included in the search request; providing a processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving a search request input by a user to search for a target video, the search request including multimodal search data for the target video; obtaining a processing result, utilizing a pre-configured algorithm, based on the multimodal search data, the processing result comprising one or more semantic labels for the multimodal search data generated by the pre-configured algorithm; and using the processing result to search a corresponding pre-constructed index, the pre-constructed index based on the processing result to obtain the target video. 2. The method of claim 1 , the multimodal search data comprising text data and the obtaining a processing result of the multimodal search data comprising: processing the text data based on a pre-configured text algorithm to obtain a semantic text label of the text data. 3. The method of claim 1 , the multimodal search data comprising image data and the obtaining a processing result of the multimodal search data comprising: processing the image data based on a pre-configured image algorithm to obtain a semantic image label of the image data; and processing the image data based on a pre-configured vectorization model to obtain a vectorized description of the image data. 4. The method of claim 1 , the multimodal search data comprising video data, the method further comprising: processing the video data into video metadata and video stream data; and segmenting the video stream data into a sequence of video frames based on a pre-configured segmenting manner. 5. The method of claim 4 , the obtaining a processing result of the multimodal search data based on the multimodal search data comprising: processing the video metadata based on a text algorithm to obtain a semantic text label of the video metadata; processing video frames in the sequence of video frames based on a pre-configured video algorithm to obtain semantic video labels of the video frames; and processing the video frames based on a vectorization model to obtain vectorized descriptions of the video frames. 6. The method of claim 2 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising providing a semantic text label of text data with regard to a corresponding pre-constructed inverted index to search to obtain the target video. 7. The method of claim 3 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising: providing the semantic image label with regard to a corresponding pre-constructed inverted index to search to obtain a first initial video; providing the vectorized description of the image data with regard to a corresponding pre-constructed vector index to search to obtain a second initial video; and obtaining the target video based on the first initial video and the second initial video. 8. The method of claim 5 , the providing the processing result of the multimodal search data with regard to a corresponding pre-constructed index to search to obtain the target video comprising: providing the semantic text label of the video metadata with regard to a corresponding pre-constructed inverted index to search to obtain a third initial video; providing the vectorized descriptions of the video frames with regard to a corresponding pre-constructed vector index to search to obtain a fourth initial video; and obtaining the target video based on the third initial video and the fourth initial video. 9. The method of claim 7 , the providing the semantic image label with regard to obtain a first initial video comprising: combining the semantic image label and the semantic text label of the text data to generate a combined label; and providing the combined label with regard to the corresponding pre-constructed inverted index to search to obtain the first initial video. 10. The method of claim 8 , the providing the semantic text label of the video metadata with regard to a corresponding pre-constructed inverted index to search to obtain a third initial video comprising: combining the semantic text label of the video metadata, the semantic text label of the text data, and the semantic video labels of the video frames to generate a combined label; and providing the combined label with regard to the corresponding pre-constructed inverted index to search to obtain the third initial video. 11. The method of claim 1 , the index being constructed by: obtaining video data; processing the video data into video metadata and video stream data; processing the video metadata based on a pre-configured text algorithm to obtain a text processing result of the video metadata; processing the video stream data based on a pre-configured video algorithm and vectorization model, respectively, to obtain a video processing result and a vectorization processing result of the video stream data; constructing an inverted index based on the text processing result and the video processing result; and constructing a vector index based on the vectorization processing result. 12. An apparatus comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor, the stored program logic comprising: logic, executed by the processor, for receiving a search request input by a user to search for a target video, the search request including multimodal search data for the target video, logic, executed by the processor, for obtaining a processing result, utilizing a pre-configured algorithm, based on the multimodal search data, the processing result comprising one or more semantic labels for the multimodal search data generated by the pre-configured algorithm; and logic, executed by the processor, for using the processing result to search a corresponding pre-constructed index, the pre-constructed index based on the processing result to obtain the target video. 13. The apparatus of claim 12 , the multimodal search data comprising text data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the text data based on a pre-configured text algorithm to obtain a semantic text label of the text data. 14. The apparatus of claim 12 , the multimodal search data image data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the image data based on a pre-configured image algorithm to obtain a semantic image label of the image data, and logic, executed by the processor, for processing the image data based on a pre-configured vectorization model to obtain a vectorized description of the image data. 15. The apparatus of claim 12 , the multimodal search data video data; and the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the video data into video metadata and video stream data, and logic, executed by the processor, for segmenting the video stream data into a sequence of video frames based on a pre-configured segmenting manner. 16. The apparatus of claim 15 , the logic for obtaining a processing result of the multimodal search data comprising: logic, executed by the processor, for processing the video metadata based on a text algorithm to obtain a semantic text label of the video metadata, logic

Assignees

Alibaba Group Holding Ltd

Inventors

Classifications

G06F16/73Primary
Querying · CPC title
G06F16/71Primary
Indexing; Data structures therefor; Storage structures · CPC title
G06F40/30
Semantic analysis · CPC title
G06V20/41
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
G06V20/49
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title

Patent family

Related publications grouped by family.

View patent family 76545485

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11782979B2 cover?: Embodiments of the disclosure provide methods and apparatuses for video searches and methods and apparatuses for index construction. In one embodiment, the method comprises: upon receiving a search request input by a user to search for a target video, processing, based on a pre-configured algorithm, multimodal search data for the target video included in the search request; providing a processi…
Who is the assignee on this patent?: Alibaba Group Holding Ltd
What technology area does this patent fall under?: Primary CPC classification G06F16/73. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method for searching and device thereof

Crowd sourced indexing and/or searching of content

Methods and architecture for indexing and editing compressed video over the world wide web

Search method and device

Frequently asked questions