Cross-media search method

US10719664B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10719664-B2
Application numberUS-201616314673-A
CountryUS
Kind codeB2
Filing dateDec 1, 2016
Priority dateJul 11, 2016
Publication dateJul 21, 2020
Grant dateJul 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and the text features by means of logistic regression. A correlation between the two heterogeneous features, which are images and text, is found by means of semantic matching based on logistic regression, and thus cross-media search is achieved. The feature extraction method can effectively indicate deep semantics of image and text, improve cross-media search accuracy, and thus greatly improve the cross-media search effect.

First claim

Opening claim text (preview).

What is claimed is: 1. A cross-media search method which utilizes A VGG convolutional neural network (VGG net) proposed by VGG to extract image features, and utilizes a Fisher Vector based on Word2vec to extract text features, and performs semantic matching on heterogeneous images and the text features by means of logistic regression, to accomplish cross-media search, the method comprising: Step 1 [)] collecting a cross-media search dataset containing category labels, set as D={D 1 , D 2 , . . . , D n }, where n represents the size of dataset, wherein data types in the cross-media search dataset includes image and text media, represented as image-text pairs D i (D i ∈D), D i =(D i I ,D i T ), where D i I represents the original data of the image, and D i T represents the original text data, wherein the category labels are set as L, L=[l 1 , l 2 , . . . , l n ], where l i ∈[1, 2, . . . , C], C is the number of categories, and l i represents the category to which the ith pair of images and text belong; dividing the cross-media search dataset into training data and test data; Step 2[)] for all image data D I in dataset D, where D I ={(D 1 I , D 2 I , . . . , D n I ), using a VGG convolutional neural network (VGG net) to extract image features, wherein the 4096-dimensional features of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, are denoted as I=(I 1 , I 2 , . . . , I n }, where I j ∈R 4096 , j∈[1,n], serving as the image features; Step 3[)] for the text feature data D T in the dataset, where D T ={D 1 T , D 2 T , . . . , D n T }, using a Fisher Vector based on Word2vec to extract text features, which includes converting D T into a word vector set W={(W 1 , W 2 , . . . , W n }, wherein W is a word vector set of words contained in D T ; substituting the word vector set Wi of each text word in W={(W 1 , W 2 , . . . , W n } into X, and obtaining the Fisher Vector for each text, denoted as T={(T 1 , T 2 , . . . , T n }, T i ∈R (2×dw+1)×G−1 , i∈[1,n], where T i represents the Fisher Vector calculated from the ith text; thus extracting text features; Step 4[)] training a semantic matching model based on logistic regression using the image features and text features in the training data obtained by performing Step 2[)] and Step 3 [)]; converting the text feature T into a text semantic feature Π T , Π T ={Π 1 T , Π 2 T , . . . , Π n T }, Π i T ∈R c , i∈[1,n], wherein c is the number of categories, also the dimension of the text semantic feature; and transforming the image feature I i into the semantic feature composed of the posterior probability, the posterior probability is P L|I i (k|I i ), k∈[1,C], indicating the probability of image I i belonging to category k; and Step 5[)] using the semantic matching model trained in Step 4[)], and the image features and text features of the test data obtained in Step 2[)] and Step 3[)], to test an image or text to obtain a cross-media search result comprising related texts or images. 2. The cross-media search method according to claim 1 , wherein in Step 3[)], [step of] using a Fisher Vector based on Word2vec to extract text features comprises: Step 6 [31)] converting the original text data D T , where D T ={(D 1 T , D 2 T , . . . , D n T }, to a word vector set W={W 1 , W 2 , . . . , W n }, and W is a word vector set of the words contained in D T ; Step 7 [32)] recording the word as w, and the word vector corresponding to the word w as f word2vec (w); for ∀w∈D i T , f word2vec (w)∈W i , i∈[1,n], that is W i ={w i,1 , w i,2 , . . . , w i,b i }, where w i,j ∈R dw , j∈[1,b i ], w i,j is the word vector corresponding to the word contained in D i T , dw is the dimension of the word vector, and b i is the number of words contained in D i T ; and Step 8 [33)] using X={x 1 , x 2 , . . . , x nw } to represent the word vector set for a text, nw is the number of word vectors; letting the parameters of the mixed Gaussian model GMM be λ, λ={ω i , μ i , Σ i , i=1 . . . G}, where ω i , μ i , Σ i represent the weight, mean vector and covariance matrix of each Gaussian function in a GMM function, respectively, and G represents the number of Gaussian functions in the model, wherein the GMM function is defined in following equation [as Equation 1]: L ( X |λ)=Σ t=1 nw log p ( x t |λ),  (1) where p(x t |λ) represents probability generated by the GMM function for the vector x t (t∈[1,nw]), expressed in following equation [as Equation 2]: p ( x t |λ)=Σ i=1 G ω i p i ( x t |λ)  (2) setting the sum of constraints of weight ω i to 1, expressed in following equation [as Equation 3]: Σ i=1 G ω i =1,  (3) where p i (x|λ) represents the ith Gaussian function in the GMM, given by following equation [Equation 4]: p i ⁡ ( x ⁢ | ⁢ λ ) = exp ⁢ { - 1 2 ⁢ ( x - μ i ) T ⁢ ∑ i - 1 ⁢ ( x - μ i ) } ( 2 ⁢ π ) dw / 2

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • using classification, e.g. of video objects · CPC title

  • Classification; Matching · CPC title

  • Feature extraction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10719664B2 cover?
A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and …
Who is the assignee on this patent?
Univ Peking Shenzhen Graduate School
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).