Visual search and retrieval using semantic information

US2017337271A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017337271-A1
Application numberUS-201615156466-A
CountryUS
Kind codeA1
Filing dateMay 17, 2016
Priority dateMay 17, 2016
Publication dateNov 23, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus for visual search and retrieval using sematic information is described herein. The apparatus includes a controller, a scoring mechanism, an extractor, and a comparator. The controller is to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity. The scoring mechanism is to calculate a score for each segment, wherein the score is based, at least partially, on a classification probability of each frame. The extractor is to extract deep features from a highest ranked segment, and the comparator is to determine the top-K neighbors based on the deep features.

First claim

Opening claim text (preview).

1 . An apparatus, comprising: a controller to segment a video stream into a plurality of segments based on semantic information; a scoring mechanism to calculate a score for each segment, wherein the score is based, at least partially, on a classification probability of each frame; an extractor to extract deep features from a highest ranked segment; and a comparator to determine a top-K neighbors based on the deep features. 2 . The apparatus of claim 1 , wherein the semantic information is the activity occurring in each frame of the video stream. 3 . The apparatus of claim 1 , wherein the deep features are derived from a plurality of features. 4 . The apparatus of claim 1 , wherein the semantic information is segmented by one or more shot boundaries, and each frame is labeled according to a class of semantic information. 5 . The apparatus of claim 1 , wherein a convolutional neural network is used to classify each segment, and the classification probability is derived from the classification. 6 . The apparatus of claim 1 , wherein a convolutional neural network is used to obtain deep features for each segment. 7 . The apparatus of claim 1 , wherein segments lower than a predefined threshold in length are discarded. 8 . The apparatus of claim 1 , wherein the scoring mechanism determines a score that is the probability that a frame belongs to an activity based on objects in the frame. 9 . The apparatus of claim 1 , wherein the scoring mechanism determines a score that is the probability that an object of a frame belongs to a class of objects combined with an importance of the object for an activity assigned as semantic information to the frame. 10 . The apparatus of claim 1 , wherein a summary of the top-K neighbors are rendered for a user. 11 . The apparatus of claim 10 , wherein a summary of the top-K neighbors is generated by selecting key image frames for each neighbor. 12 . The apparatus of claim 10 , wherein a summary of the top-K neighbors is generated by selecting key clips of N-seconds for each neighbor. 13 . A method for visual search and retrieval, comprising: scoring each segment of a query video; extracting deep features for a highest ranked segment; and determining a plurality of nearest neighbors based on the deep features. 14 . The method of claim 13 , wherein the plurality of nearest neighbors contain content similar to the query video. 15 . The method of claim 13 , wherein the deep features are derived from a plurality of features. 16 . The method of claim 13 , wherein each segment of the query video is segmented based on semantic information. 17 . The method of claim 16 , wherein the semantic information includes activities, objects, locations, people, or any combination thereof. 18 . A system, comprising: a display; an image capture mechanism; a memory that is to store instructions and that is communicatively coupled to the image capture mechanism and the display; and a processor communicatively coupled to the image capture mechanism, the display, and the memory, wherein when the processor is to execute the instructions, the processor is to: score each segment of a query video, wherein the score is based on semantic information; extract deep features for a highest ranked segment of the query video; and determine a plurality of nearest neighbors in a plurality of videos based on the deep features. 19 . The system of claim 18 , wherein the deep features are derived from a plurality of features. 20 . The system of claim 18 , wherein the score is based on, at least partially, a classification probability the semantic information for each frame of the query video. 21 . The system of claim 18 , wherein a convolutional neural network is used to label each frame of the query video according to the semantic information in the video. 22 . A tangible, non-transitory, computer-readable medium comprising instructions that, when executed by a processor, direct the processor to: score each segment of an input query video; extract deep features for a highest ranked segment; and determine a top-K neighbors based on the deep features. 23 . The computer readable medium of claim 22 , wherein a plurality of nearest neighbors contain content similar to the query video. 24 . The computer readable medium of claim 22 , wherein the deep features are derived from a plurality of features. 25 . The computer readable medium of claim 22 , wherein each segment of the query video is segmented based on semantic information.

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Proximity, similarity or dissimilarity measures · CPC title

  • using classification, e.g. of video objects · CPC title

  • Distances to closest patterns, e.g. nearest neighbour classification · CPC title

  • using objects detected or recognised in the video content · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017337271A1 cover?
An apparatus for visual search and retrieval using sematic information is described herein. The apparatus includes a controller, a scoring mechanism, an extractor, and a comparator. The controller is to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity. The scoring mechanism is to calculate a score for each segment, wherein…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/7837. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 23 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).