Method for keyword extraction and electronic device implementing the same
US-12135940-B2 · Nov 5, 2024 · US
US12579797B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12579797-B2 |
| Application number | US-202217852684-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2022 |
| Priority date | Dec 31, 2019 |
| Publication date | Mar 17, 2026 |
| Grant date | Mar 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A video clip location technology in the field of computer vision pertaining to artificial intelligence that provides a video processing method and apparatus. The method includes: obtaining a semantic feature of an input sentence; performing semantic enhancement on a video frame based on the semantic feature to obtain a video feature of the video frame, where the video feature includes the semantic feature; and determining, based on the semantic feature and the video feature, whether a video clip to which the video frame belongs is a target video clip corresponding to the input sentence. The method helps improve accuracy of recognizing a target video clip corresponding to an input sentence.
Opening claim text (preview).
What is claimed is: 1 . A video processing method comprising: extracting, with a convolutional neural network (CNN) combined with a recurrent neural network (RNN), a semantic feature of an input sentence; performing semantic enhancement on a video frame based on the semantic feature to obtain a video feature of the video frame, wherein the video feature is obtained through convolution processing using a convolution kernel determined based on the semantic feature, and wherein the semantic enhancement comprises contextual interaction among video frames by exchanging features between the video frame and at least one other video frame in the same video via a context interaction submodule; and determining, based on the semantic feature and the video feature, whether a video clip to which the video frame belongs is a target video clip corresponding to the input sentence. 2 . The video processing method according to claim 1 , wherein performing the semantic enhancement on the video frame based on the semantic feature, to obtain the video feature of the video frame further comprises: determining a word corresponding to the video frame in the input sentence; and performing semantic enhancement on the video frame based on a semantic feature of the word corresponding to the video frame to obtain the video feature of the video frame. 3 . The video processing method according to claim 1 , wherein performing the semantic enhancement on the video frame based on the semantic feature, to obtain the video feature of the video frame in the video further comprises: performing feature extraction, with the CNN, on the video frame based on the semantic feature, to obtain the video feature of the video frame. 4 . The video processing method according to claim 1 , further comprising: obtaining an initial video feature of the video frame; and performing the semantic enhancement on the video frame based on the semantic feature, to obtain the video feature of the video frame further comprises: performing semantic enhancement on the initial video feature based on the semantic feature, to obtain the video feature of the video frame. 5 . The video processing method according to claim 1 , further comprising: performing feature fusion on the video feature of the video frame by using a video feature of at least one other video frame to obtain a fused video feature of the video frame, wherein the at least one other video frame and the video frame belong to a same video; and determining, based on the semantic feature and the video feature, whether the video clip to which the video frame belongs is the target video clip corresponding to the input sentence further comprises: determining, based on the semantic feature and the fused video feature, whether the video clip to which the video frame belongs is the target video clip corresponding to the input sentence. 6 . The video processing method according to claim 1 , wherein determining, based on the semantic feature and the video feature, whether the video clip to which the video frame belongs is the target video clip corresponding to the input sentence further comprises: determining a hierarchical structure of the video clip in time domain based on the video feature; and determining, based on the semantic feature and the hierarchical structure, whether the video clip is the target video clip corresponding to the input sentence. 7 . A video processing apparatus, comprising a processor and a memory, wherein the memory is configured to store program instructions, and the processor is configured to invoke the program instructions to perform: extracting, with a convolutional neural network (CNN) combined with a recurrent neural network (RNN), a semantic feature of an input sentence; performing semantic enhancement on a video frame based on the semantic feature, to obtain a video feature of the video frame, wherein the video feature is obtained through convolution processing using a convolution kernel determined based on the semantic feature, and wherein the semantic enhancement comprises contextual interaction among video frames by exchanging features between the video frame and at least one other video frame in the same video via a context interaction submodule; and determining, based on the semantic feature and the video feature, whether a video clip to which the video frame belongs is a target video clip corresponding to the input sentence. 8 . The video processing apparatus according to claim 7 , wherein performing the semantic enhancement on the video frame based on the semantic feature, to obtain the video feature of the video frame further comprises: determining a word corresponding to the video frame in the input sentence; and performing semantic enhancement on the video frame based on a semantic feature of the word corresponding to the video frame to obtain the video feature of the video frame. 9 . The video processing apparatus according to claim 7 , wherein performing the semantic enhancement on the video frame based on the semantic feature, to obtain the video feature of the video frame in the video further comprises: performing feature extraction, with the CNN, on the video frame based on the semantic feature, to obtain the video feature of the video frame. 10 . The video processing apparatus according to claim 7 , wherein the processor is further configured to: obtain an initial video feature of the video frame; and performing the semantic enhancement on the video frame based on the semantic feature, to obtain the video feature of the video frame further comprises: performing semantic enhancement on the initial video feature based on the semantic feature, to obtain the video feature of the video frame. 11 . The video processing apparatus according to claim 7 , wherein the processor is further configured to: perform feature fusion on the video feature of the video frame by using a video feature of at least one other video frame to obtain a fused video feature of the video frame, wherein the at least one other video frame and the video frame belong to a same video; and determining, based on the semantic feature and the video feature, whether the video clip to which the video frame belongs is the target video clip corresponding to the input sentence further comprises: determining, based on the semantic feature and the fused video feature, whether the video clip to which the video frame belongs is the target video clip corresponding to the input sentence. 12 . The video processing apparatus according to claim 7 , wherein determining, based on the semantic feature and the video feature, whether the video clip to which the video frame belongs is the target video clip corresponding to the input sentence further comprises: determining a hierarchical structure of the video clip in time domain based on the video feature; and determining, based on the semantic feature and the hierarchical structure, whether the video clip is the target video clip corresponding to the input sentence. 13 . A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable medium stores program code to be executed by a device, and the program code is used for performing: extracting, with a convolutional neural network (CNN) combined with a recurrent neural network (RNN), a semantic feature of an input sentence; performing semantic enhancement on a video frame based on the semantic feature, to obtain a video feature of the video frame, wherein the video feature is obtained through convolution processing using a convolution kernel determined based on the semantic feature, and wh
Matching video sequences · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
of extracted features · CPC title
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
of extracted features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.