Method, terminal, and computer storage medium for image classification

US11048983B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11048983-B2
Application numberUS-202016932599-A
CountryUS
Kind codeB2
Filing dateJul 17, 2020
Priority dateJan 19, 2018
Publication dateJun 29, 2021
Grant dateJun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are a method, terminal and computer readable storage medium for image classification. The method includes: determining an image feature vector of an image based on a convolutional neural network, where the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; and determining a category of the image based on a result of a deep neural network, where the result is determined based on the image feature vector, the text feature vector and the image-text feature vector.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for image classification, comprising: determining an image feature vector of an image based on a convolutional neural network, wherein the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; determining a first classification result vector corresponding to the image feature vector, a second classification result vector corresponding to the text feature vector, and a third classification result vector corresponding to the image-text feature vector, based on a deep neural network; determining a target result vector by weighting and summing the first classification result vector, the second classification result vector and the third classification result vector; and determining a category of the image based on the target result vector. 2. The method according to claim 1 , wherein said that determining a text feature vector based on the textual information and an embedded network comprises: determining multiple segmented words by removing stop words in the textual information; determining position information of each segmented word in a text feature set; generating an index value of the segmented word based on the position information; determining a description vector corresponding to each segmented word based on the index value and the embedded network; and determining a text feature vector by weighting and averaging description vectors corresponding to the multiple segmented words in same dimensions. 3. The method according to claim 1 , wherein, said determining an image-text feature vector by joining the image feature vector with the text feature vector comprises: determining a mapped text feature vector and a mapped image feature vector by mapping the text feature vector and the image feature vector in same dimensions; and generating an image-text feature vector by joining the mapped text feature vector with the mapped image feature vector dimensionally. 4. The method according to claim 1 , wherein, the method further comprises: acquiring sample images; determining a description set based on each sample image, wherein the description set is null in response to that the sample image has no textual information, and the description set comprises segmented words in response to that the sample image has textual information, wherein the segmented words comprises words except stop words in the textual information; determining a text feature subset based on the description set; and determining a text feature set by combining text feature subsets. 5. A terminal, comprising: a memory; a processor; and a program for image classification that is stored on the memory and runs on the processor; wherein the program, when executed by the processor, implements steps of: determining an image feature vector of an image based on a convolutional neural network, wherein the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; determining a first classification result vector corresponding to the image feature vector, a second classification result vector corresponding to the text feature vector, and a third classification result vector corresponding to the image-text feature vector, based on a deep neural network; determining a target result vector by weighting and summing the first classification result vector, the second classification result vector and the third classification result vector; and determining a category of the image based on the target result vector. 6. The terminal according to claim 5 , wherein said that determining a text feature vector based on the textual information and an embedded network comprises: determining multiple segmented words by removing stop words in the textual information; determining position information of each segmented word in a text feature set; generating an index value of the segmented word based on the position information; determining a description vector corresponding to each segmented word based on the index value and the embedded network; and determining a text feature vector by weighting and averaging description vectors corresponding to the multiple segmented words in same dimensions. 7. The terminal according to claim 5 , wherein, said determining an image-text feature vector by joining the image feature vector with the text feature vector comprises: determining a mapped text feature vector and a mapped image feature vector by mapping the text feature vector and the image feature vector in same dimensions; and generating an image-text feature vector by joining the mapped text feature vector with the mapped image feature vector dimensionally. 8. The terminal according to claim 5 , wherein, the program, when executed by the processor, further implements steps of: acquiring sample images; determining a description set based on each sample image, wherein the description set is null in response to that the sample image has no textual information, and the description set comprises segmented words in response to that the sample image has textual information, wherein the segmented words comprises words except stop words in the textual information; determining a text feature subset based on the description set; and determining a text feature set by combining text feature subsets. 9. A non-transitory computer readable storage medium, wherein, the computer readable storage medium stores a program for image classification thereon, wherein the program, when executed by a processor, implements steps of: determining an image feature vector of an image based on a convolutional neural network, wherein the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; determining a first classification result vector corresponding to the image feature vector, a second classification result vector corresponding to the text feature vector, and a third classification result vector corresponding to the image-text feature vector, based on a deep neural network; determining a target result vector by weighting and summing the first classification result vector, the second classification result vector and the third classification result vector; and determining a category of the image based on the target result vector. 10. The non-transitory computer readable storage medium according to claim 9 , wherein said that determining a text feature vector based on the textual information and an embedded network comprises: determining multiple segmented words by removing stop words in the textual information; determining position information of each segmented word in a text feature set; generating an index value of the segmented word based on the position information; determining a description vector corresponding to each segmented word based on the index value and the embedded network; and determining a text feature vector by weighting and averaging description vectors corresponding to the multiple segmented words in same dimensions. 11. The non-transitory computer readable storage medium according to claim 9 , wherein, said determining an image-text feature vector by joining the image feature vector with the text feature vector comprises: determining a mapped text

Assignees

Inventors

Classifications

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Combination of methods, e.g. classifiers, working on the same input data · CPC title

  • Text, e.g. of license plates, overlay texts or captions on TV images · CPC title

  • of classification results, e.g. where the classifiers operate on the same input data · CPC title

  • Classification techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11048983B2 cover?
Disclosed are a method, terminal and computer readable storage medium for image classification. The method includes: determining an image feature vector of an image based on a convolutional neural network, where the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining …
Who is the assignee on this patent?
Beijing Dajia Internet Information Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).