Method of classifying a multimodal object

US9569698B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9569698-B2
Application numberUS-201314434723-A
CountryUS
Kind codeB2
Filing dateOct 7, 2013
Priority dateOct 12, 2012
Publication dateFeb 14, 2017
Grant dateFeb 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of classifying a multimodal test object described according to at least one first and one second modality is provided, including offline construction by classification of a multimedia dictionary, defined by a plurality of multimedia words, based on a recoding matrix of representatives of the first modality forming a dictionary of the first modality including a plurality of words of the first modality, the recoding matrix constructed to express the frequency of each word of the second modality of a dictionary of the second modality including a plurality of words of the second modality, for each word of the first modality, classification of a multimodal test object performed online by recoding each representative of the first modality relating to the multimedia object considered on the multimedia dictionary base, followed by aggregating representatives of the first modality coded in the recoding in a single vector representative of the multimodal object.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for classifying a multimodal test object, termed a multimedia test object, described according to at least one first modality and one second modality, said method comprising: constructing a recoding matrix X of representatives of the first modality forming a dictionary of the first modality including a plurality K T of words of the first modality, wherein each of the components of the recoding matrix X forms information representative of the frequency of each word of the second modality of a dictionary of the second modality including a plurality K v of words of the second modality, for each word of the first modality, an offline construction, by unsupervised classification, of a multimedia dictionary W m , defined by a plurality K m of multimedia words, on the basis of the recoding matrix X, a classification of a multimedia test object comprising: recoding of each representative of the first modality, relating to the multimedia test object, on the multimedia dictionary W m base, and aggregating the representatives of the first modality coded in the recoding step in a single vector BoMW representative of the multimedia test object. 2. The method of classification of claim 1 , wherein constructing the recoding matrix X comprises: constructing a word occurrence matrix for the second modality on a plurality of N images, constructing an intermediate matrix including K T columns, each column corresponding to a word of the first modality, said intermediate matrix containing, for each image in the plurality N, information representative of the presence or absence of each word of the second modality, constructing, from the occurrence matrix and the intermediate matrix, the recoding matrix X which contains for each word of the first modality and each word of the second modality an aggregation on the plurality of N images of the occurrence of the word of the second modality for the word of the first modality. 3. The method of classification of claim 1 , wherein said first modality is textual, and said second modality is visual, the test object being a test image associated with textual tags, said dictionary according to the first modality being a textual dictionary W T and said dictionary according to the second modality being a visual dictionary W v . 4. The method of classification of claim 3 , comprising a sequence of at least the following steps performed offline: extracting the visual features of a plurality N of images forming a learning base, during which the local features of each image are extracted and coded on the visual dictionary W v ; constructing the recoding matrix X; normalizing the recoding matrix X; performing an unsupervised classification step, referred to as a step of clustering the normalized recoding matrix, for generating the multimedia dictionary W m . 5. The method of classification of claim 3 , comprising a sequence of at least the following steps performed online: recoding of each textual tag of the test image on the multimedia dictionary, W m , for generating a recoded matrix Z; aggregating the recoded matrix Z and generating a multimedia signature BoMW for the test image. 6. The method of classification of claim 1 , wherein recoding is based on a locally constrained linear coding technique. 7. The method of classification of claim 4 , wherein said normalizing the recoding matrix comprises a row-wise normalization of the recoding matrix X according to the L 1 -norm. 8. The classification method of claim 4 , wherein said step of clustering is performed based on a K-means algorithm. 9. A device for classifying a test object comprising a microprocessor and a data memory for implementing a method for classifying a multimodal test object, termed a multimedia test object, described according to at least one first modality and one second modality, said method comprising: constructing a recoding matrix X of representatives of the first modality forming a dictionary of the first modality including a plurality K T of words of the first modality, wherein each of the components of the recoding matrix X forms information representative of the frequency of each word of the second modality of a dictionary of the second modality including a plurality K v of words of the second modality, for each word of the first modality, an offline construction, by unsupervised classification, of a multimedia dictionary W m , defined by a plurality K m of multimedia words, on the basis of the recoding matrix X, a classification of a multimedia test object comprising: recoding of each representative of the first modality, relating to the multimedia test object, on the multimedia dictionary W m base, aggregating the representatives of the first modality coded in the recoding step in a single vector BoMW representative of the multimedia test object. 10. A computer program comprising instructions stored on a tangible non-transitory storage medium for executing, on a processor, a method for classifying a multimodal test object, termed a multimedia test object, described according to at least one first modality and one second modality, said method comprising: constructing a recoding matrix X of representatives of the first modality forming a dictionary of the first modality including a plurality K T of words of the first modality, wherein each of the components of the recoding matrix X forms information representative of the frequency of each word of the second modality of a dictionary of the second modality including a plurality K v of words of the second modality, for each word of the first modality, an offline construction, by unsupervised classification, of a multimedia dictionary W m , defined by a plurality K m of multimedia words, on the basis of the recoding matrix X, a classification of a multimedia test object comprising: recoding of each representative of the first modality, relating to the multimedia test object, on the multimedia dictionary W m base, aggregating the representatives of the first modality coded in the recoding step in a single vector BoMW representative of the multimedia test object. 11. A tangible non-transitory processor-readable recording medium on which a program is recorded comprising instructions for executing a method for classifying a multimodal test object, termed a multimedia test object, described according to at least one first modality and one second modality, said method comprising: constructing a recoding matrix X of representatives of the first modality forming a dictionary of the first modality including a plurality K T of words of the first modality, wherein each of the components of the recoding matrix X forms information representative of the frequency of each word of the second modality of a dictionary of the second modality including a plurality K v of words of the second modality, for each word of the first modality, an offline construction, by unsupervised classification, of a multimedia dictionary W m , defined by a plurality K m of multimedia words, on the basis of the recoding matrix X, a classification of a multimedia test object comprising: recoding of each representative of the first modality, relating to the multimedia test object, on the multimedia dictionary W m base, aggregating the representatives of the first modality coded in the recoding step in a single vector BoMW representative of the multimedia test object.

Assignees

Inventors

Classifications

  • of extracted features · CPC title

  • of results relating to different input data, e.g. multimodal recognition · CPC title

  • G06F16/40Primary

    of multimedia data, e.g. slideshows comprising image and additional audio data (retrieval of still image data G06F16/50; retrieval of audio data G06F16/60; retrieval of video data G06F16/70) · CPC title

  • of extracted features · CPC title

  • Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9569698B2 cover?
A method of classifying a multimodal test object described according to at least one first and one second modality is provided, including offline construction by classification of a multimedia dictionary, defined by a plurality of multimedia words, based on a recoding matrix of representatives of the first modality forming a dictionary of the first modality including a plurality of words of the…
Who is the assignee on this patent?
Commissariat Energie Atomique
What technology area does this patent fall under?
Primary CPC classification G06F16/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).