Interaction method and apparatus, electronic device, and storage medium
US-2024406508-A1 · Dec 5, 2024 · US
US2017337271A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017337271-A1 |
| Application number | US-201615156466-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 17, 2016 |
| Priority date | May 17, 2016 |
| Publication date | Nov 23, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus for visual search and retrieval using sematic information is described herein. The apparatus includes a controller, a scoring mechanism, an extractor, and a comparator. The controller is to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity. The scoring mechanism is to calculate a score for each segment, wherein the score is based, at least partially, on a classification probability of each frame. The extractor is to extract deep features from a highest ranked segment, and the comparator is to determine the top-K neighbors based on the deep features.
Opening claim text (preview).
1 . An apparatus, comprising: a controller to segment a video stream into a plurality of segments based on semantic information; a scoring mechanism to calculate a score for each segment, wherein the score is based, at least partially, on a classification probability of each frame; an extractor to extract deep features from a highest ranked segment; and a comparator to determine a top-K neighbors based on the deep features. 2 . The apparatus of claim 1 , wherein the semantic information is the activity occurring in each frame of the video stream. 3 . The apparatus of claim 1 , wherein the deep features are derived from a plurality of features. 4 . The apparatus of claim 1 , wherein the semantic information is segmented by one or more shot boundaries, and each frame is labeled according to a class of semantic information. 5 . The apparatus of claim 1 , wherein a convolutional neural network is used to classify each segment, and the classification probability is derived from the classification. 6 . The apparatus of claim 1 , wherein a convolutional neural network is used to obtain deep features for each segment. 7 . The apparatus of claim 1 , wherein segments lower than a predefined threshold in length are discarded. 8 . The apparatus of claim 1 , wherein the scoring mechanism determines a score that is the probability that a frame belongs to an activity based on objects in the frame. 9 . The apparatus of claim 1 , wherein the scoring mechanism determines a score that is the probability that an object of a frame belongs to a class of objects combined with an importance of the object for an activity assigned as semantic information to the frame. 10 . The apparatus of claim 1 , wherein a summary of the top-K neighbors are rendered for a user. 11 . The apparatus of claim 10 , wherein a summary of the top-K neighbors is generated by selecting key image frames for each neighbor. 12 . The apparatus of claim 10 , wherein a summary of the top-K neighbors is generated by selecting key clips of N-seconds for each neighbor. 13 . A method for visual search and retrieval, comprising: scoring each segment of a query video; extracting deep features for a highest ranked segment; and determining a plurality of nearest neighbors based on the deep features. 14 . The method of claim 13 , wherein the plurality of nearest neighbors contain content similar to the query video. 15 . The method of claim 13 , wherein the deep features are derived from a plurality of features. 16 . The method of claim 13 , wherein each segment of the query video is segmented based on semantic information. 17 . The method of claim 16 , wherein the semantic information includes activities, objects, locations, people, or any combination thereof. 18 . A system, comprising: a display; an image capture mechanism; a memory that is to store instructions and that is communicatively coupled to the image capture mechanism and the display; and a processor communicatively coupled to the image capture mechanism, the display, and the memory, wherein when the processor is to execute the instructions, the processor is to: score each segment of a query video, wherein the score is based on semantic information; extract deep features for a highest ranked segment of the query video; and determine a plurality of nearest neighbors in a plurality of videos based on the deep features. 19 . The system of claim 18 , wherein the deep features are derived from a plurality of features. 20 . The system of claim 18 , wherein the score is based on, at least partially, a classification probability the semantic information for each frame of the query video. 21 . The system of claim 18 , wherein a convolutional neural network is used to label each frame of the query video according to the semantic information in the video. 22 . A tangible, non-transitory, computer-readable medium comprising instructions that, when executed by a processor, direct the processor to: score each segment of an input query video; extract deep features for a highest ranked segment; and determine a top-K neighbors based on the deep features. 23 . The computer readable medium of claim 22 , wherein a plurality of nearest neighbors contain content similar to the query video. 24 . The computer readable medium of claim 22 , wherein the deep features are derived from a plurality of features. 25 . The computer readable medium of claim 22 , wherein each segment of the query video is segmented based on semantic information.
using neural networks · CPC title
Proximity, similarity or dissimilarity measures · CPC title
using classification, e.g. of video objects · CPC title
Distances to closest patterns, e.g. nearest neighbour classification · CPC title
using objects detected or recognised in the video content · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.