System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning

US10614366B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10614366-B1
Application numberUS-201615061641-A
CountryUS
Kind codeB1
Filing dateMar 4, 2016
Priority dateJan 31, 2006
Publication dateApr 7, 2020
Grant dateApr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of extracting implicit concepts within a set of multimedia works, comprising: (a) receiving a plurality of portions of the set of multimedia works, each portion comprising semantic features and non-semantic features; (b) probabilistically determining, with at least one automated data processor, a set of semantic concepts inherent in the respective non-semantic features of the received portions, based on at least a Bayesian model, comprising a hidden concept layer formulated based on at least one joint probability distribution which models a probability that a respective semantic concept annotates a respective non-semantic feature that connects a semantic feature layer and a non-semantic feature layer, wherein the hidden concept layer is discovered by fitting a generative model to a training set comprising non-semantic features and annotation semantic features, the conditional probabilities of the non-semantic features and the annotation semantic features given a hidden concept class being determined based on an Expectation-Maximization (EM) based iterative learning procedure, the non-semantic features being generated from a plurality of respective Gaussian distributions, respectively corresponding to a semantic concept, each non-semantic feature having a conditional probability density function selectively dependent on a covariance matrix of non-semantic features belonging to the respective semantic concept; (c) determining, with the at least one automated data processor, a semantic concept vector for a respective multimedia work, dependent on at least the determined semantic concepts inherent in the respective non-semantic features of the received portions; and (d) at least one of storing and communicating information representing the determined semantic concept vector. 2. The method according to claim 1 , further comprising receiving a word as an input, and outputting at least one image or an identifier of at least one image corresponding to the word. 3. The method according to claim 1 , further comprising receiving an image as an input, and outputting at least one word or an identifier of at least one word corresponding to the image. 4. The method according to claim 1 , wherein said probabilistically determining comprises employing at least one conditional probability represented in the Bayesian model for associating words with an image, comprising a set of parameters stored in a memory representing the hidden concept layer which connects a non-semantic feature layer comprising a visual feature layer and a semantic feature layer comprising a word layer. 5. The method according to claim 4 , further comprising discovering the hidden concept layer by fitting the generative model to a training set comprising image and annotation words, wherein the conditional probabilities of the visual features and the annotation words given the hidden concept class are determined based on the Expectation-Maximization (EM) based iterative learning procedure. 6. The method according to claim 5 , wherein the Bayesian model comprises a semantic Bayesian framework representing an association of visual content with a plurality of semantic concepts, comprising at least one hidden layer formulated based on at least one joint probability distribution which models a probability that a word belonging to a respective semantic concept is an annotation word of respective visual content; wherein a set of visual content is mapped to the semantic Bayesian framework dependent on semantic concepts represented in respective visual content, using at least one automated processor which automatically determines a set of annotation words associated with the respective visual content; at least one implicit semantic concept is automatically extracted from a received query seeking elements of the set of visual content corresponding to at least one implicit semantic concept, using at least one automated processor; elements of the mapped set of visual content corresponding to the at least one extracted implicit semantic concept are automatically determined, using at least one automated processor; and the corresponding visual content is ranked in accordance with at least a correspondence to the at least one extracted implicit semantic concept. 7. The method according to claim 5 , wherein the hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images and annotation words. 8. The method according to claim 7 , wherein f i , i∈[1, N] denotes a visual feature vector of images in a training database, where N is the size of the database, w i , j∈[1, M] denotes the distinct textual words in a training annotation word set, where M is the size of annotation vocabulary in the training database, the visual features of images in the database, f i =[f i 1 , f i 2 , . . . , f i L ], i∈[1, N] are known i.i.d. samples from an unknown distribution, having a visual feature dimension L, the specific visual feature annotation word pairs (f i , w j ), i∈[1, N], j∈[1, M] are known i.i.d. samples from an unknown distribution, associated with an unobserved semantic concept variable z∈Z={z 1 , . . . z k }, in which each observation of one visual feature f∈F={f i , f 2 , . . . , f N } belongs to one or more concept classes z k and each observation of one word w∈V+{w 1 , w 2 , . . . , w M } in one image f i belongs to one concept class, in which the observation pairs or random variables (f i , w j ) are both assumed to be both generated independently assumed to be conditionally independent given the respective hidden concept z k , such that P(f i ,w j |z k )=p ℑ (f i |z k )P V (w j |z k ); the visual feature and word distribution is treated as a randomized data generation process, wherein a probability of a concept is represented as P z (z k ); a visual feature is selected f i ∈F with probability P ℑ (f i |z k ); and a textual word is selected w j ∈V with probability P V (w j |z k ), from which an observed pair (f i ,w j ) is obtained, such that a joint probability model is expressed as follows: P ⁡ ( f i , w j ) = P ⁡ ( w j ) ⁢ P ⁡ ( f i | w j )

Assignees

Inventors

Classifications

  • Knowledge representation; Symbolic representation · CPC title

  • using colour · CPC title

  • G06N7/005Primary

    Physics · mapped topic

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Bayesian classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10614366B1 cover?
Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual wo…
Who is the assignee on this patent?
Univ New York State Res Found, The Research Foundation For The State Univ O
What technology area does this patent fall under?
Primary CPC classification G06F16/5838. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).