Association of visual labels and event context in image data

US9734166B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9734166-B2
Application numberUS-201313975497-A
CountryUS
Kind codeB2
Filing dateAug 26, 2013
Priority dateAug 26, 2013
Publication dateAug 15, 2017
Grant dateAug 15, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A first set of contextual dimensions is generated from one or more textual descriptions associated with a given event, which includes one or more examples. A second set of contextual dimensions is generated from one or more visual features associated with the given event, which includes one or more visual example recordings. A similarity structure is constructed from the first set of contextual dimensions and the second set of contextual dimensions. One or more of the textual descriptions is matched with one or more of the visual features based on the similarity structure.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: generating a first set of contextual dimensions from one or more textual descriptions associated with a given event, wherein the one or more textual descriptions comprise a corpus of text describing one or more aspects of the given event, and the first set of contextual dimensions results in a first taxonomy for the one or more textual descriptions; generating a second set of contextual dimensions from one or more audio-visual features associated with the given event, wherein the one or more audio-visual features comprise at least one of a video content and an image content that visually depicts the one or more aspects of the given event together with an audio content component, and the second set of contextual dimensions results in a second taxonomy for the one or more audio-visual features; constructing a similarity structure from the first set of contextual dimensions and the second net of contextual dimensions, wherein the similarity structure comprises a visual and textual concept relationship network that links the first taxonomy and the second taxonomy based on relatedness between elements of the first taxonomy and the second taxonomy; and matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure such that the one or more textual descriptions that match the one or more audio-visual features serve to annotate the one or more audio-visual features; wherein the generating, constructing and matching steps are performed via one or more processing devices. 2. The method of claim 1 , wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises parsing the one or more textual descriptions associated with the given event by identifying one or more terms or one or more sets of terms appearing in one or more taxonomies or one or more ontologies. 3. The method of claim 2 , wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises mapping the one or more identified terms or one or more identified sets of terms to one or more textual objects in the one or more taxonomies or the one or more ontologies. 4. The method of claim 3 , wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises classifying the one or more textual objects into one or more classes. 5. The method of claim 4 , wherein the step of generating a first set of contextual dimensions for one or more textual descriptions associated with a given event further comprises arranging the one or more classified textual objects in a time sequence describing the given event in one or more event taxonomy graphs. 6. The method of claim 5 , wherein the step of generating a second set of contextual dimensions for one or more audio-visual features associated with the given event further comprises extracting the one or more audio-visual features associated with the given event from one or more images or one or more objects from a video frame from one or more videos. 7. The method of claim 6 , wherein the step of generating a second set of contextual dimensions for one or more audio-visual features associated with the given event further comprises classifying the one or more audio-visual features into one or more visual concepts associated with one or more taxonomies or one or more ontologies. 8. The method of claim 1 , wherein the step of constructing a similarity structure from the first set of contextual dimensions and the second set of contextual dimensions further comprises forming the relationship network by associating each of the one or more visual concepts to the one or more event taxonomy graphs. 9. The method of claim 8 , wherein the step of matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure further comprises assigning a relevant one of the one or more textual descriptions to one of the one or more images or the one or more videos based on the formed relationship network. 10. The method of claim 9 , wherein the step of classifying the one or more textual objects and the step of classifying the one or more audio-visual features further comprise selecting from context classes, object classes and activity classes. 11. A computer program product comprising a processor-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by the one or more processing devices implement steps of: generating a first set of contextual dimensions from one or more textual descriptions associated with a given event, wherein the one or more textual descriptions comprise a corpus of text describing one or more aspects of the given event, and the first set of contextual dimensions results in a first taxonomy for the one or more textual descriptions; generating a second set of contextual dimensions from one or more audio-visual features associated with the given event, wherein the one or more audio-visual features comprise at least one of a video content and an image content that visually depicts the one or more aspects of the given event together with an audio content component, and the second set of contextual dimensions results in a second taxonomy for the one or more visual features; constructing a similarity structure from the first set of contextual dimensions and the second set of contextual dimensions, wherein the similarity structure comprises a visual and textual concept relationship network that links the first taxonomy and the second taxonomy based on relatedness between elements of the first taxonomy and the second taxonomy; and matching one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure such that the one or more textual descriptions that match the one or more audio-visual features serve to annotate the one or more audio-visual features. 12. An apparatus, comprising: a memory; and a processor operatively coupled to the memory and configured to: generate a first set of contextual dimensions from one or more textual descriptions associated with a given event, wherein the one or more textual descriptions comprise a corpus of text describing one or more aspects of the given event,and the first set of contextual dimensions results in a first taxonomy for the one or more textual descriptions; generate a second set of contextual dimensions from one or more audio-visual features associated with the given event, wherein the one or more audio-visual features comprise at least one of a video content and an image content that visually depicts the one or more aspects of the given event together with an audio content component, and the second set of contextual dimensions results in a second taxonomy for the one or more audio-visual features; construct a similarity structure from the first set of contextual dimensions and the second set of contextual dimensions, wherein the similarity structure comprises a visual and textual concept relationship network that links the first taxonomy and the second taxonomy based on relatedness between elements of the first taxonomy and the second taxonomy; and match one or more of the textual descriptions with one or more of the audio-visual features based on the similarity structure such that the one or more textual descriptions that match the one or more visual audio-visual features serve to annotate the

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

  • using information manually generated, e.g. tags, keywords, comments, manually generated location and time information · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9734166B2 cover?
A first set of contextual dimensions is generated from one or more textual descriptions associated with a given event, which includes one or more examples. A second set of contextual dimensions is generated from one or more visual features associated with the given event, which includes one or more visual example recordings. A similarity structure is constructed from the first set of contextual…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30268. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).