Method, terminal, and computer storage medium for image classification
US-2020356821-A1 · Nov 12, 2020 · US
US11048983B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11048983-B2 |
| Application number | US-202016932599-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 17, 2020 |
| Priority date | Jan 19, 2018 |
| Publication date | Jun 29, 2021 |
| Grant date | Jun 29, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are a method, terminal and computer readable storage medium for image classification. The method includes: determining an image feature vector of an image based on a convolutional neural network, where the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; and determining a category of the image based on a result of a deep neural network, where the result is determined based on the image feature vector, the text feature vector and the image-text feature vector.
Opening claim text (preview).
What is claimed is: 1. A method for image classification, comprising: determining an image feature vector of an image based on a convolutional neural network, wherein the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; determining a first classification result vector corresponding to the image feature vector, a second classification result vector corresponding to the text feature vector, and a third classification result vector corresponding to the image-text feature vector, based on a deep neural network; determining a target result vector by weighting and summing the first classification result vector, the second classification result vector and the third classification result vector; and determining a category of the image based on the target result vector. 2. The method according to claim 1 , wherein said that determining a text feature vector based on the textual information and an embedded network comprises: determining multiple segmented words by removing stop words in the textual information; determining position information of each segmented word in a text feature set; generating an index value of the segmented word based on the position information; determining a description vector corresponding to each segmented word based on the index value and the embedded network; and determining a text feature vector by weighting and averaging description vectors corresponding to the multiple segmented words in same dimensions. 3. The method according to claim 1 , wherein, said determining an image-text feature vector by joining the image feature vector with the text feature vector comprises: determining a mapped text feature vector and a mapped image feature vector by mapping the text feature vector and the image feature vector in same dimensions; and generating an image-text feature vector by joining the mapped text feature vector with the mapped image feature vector dimensionally. 4. The method according to claim 1 , wherein, the method further comprises: acquiring sample images; determining a description set based on each sample image, wherein the description set is null in response to that the sample image has no textual information, and the description set comprises segmented words in response to that the sample image has textual information, wherein the segmented words comprises words except stop words in the textual information; determining a text feature subset based on the description set; and determining a text feature set by combining text feature subsets. 5. A terminal, comprising: a memory; a processor; and a program for image classification that is stored on the memory and runs on the processor; wherein the program, when executed by the processor, implements steps of: determining an image feature vector of an image based on a convolutional neural network, wherein the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; determining a first classification result vector corresponding to the image feature vector, a second classification result vector corresponding to the text feature vector, and a third classification result vector corresponding to the image-text feature vector, based on a deep neural network; determining a target result vector by weighting and summing the first classification result vector, the second classification result vector and the third classification result vector; and determining a category of the image based on the target result vector. 6. The terminal according to claim 5 , wherein said that determining a text feature vector based on the textual information and an embedded network comprises: determining multiple segmented words by removing stop words in the textual information; determining position information of each segmented word in a text feature set; generating an index value of the segmented word based on the position information; determining a description vector corresponding to each segmented word based on the index value and the embedded network; and determining a text feature vector by weighting and averaging description vectors corresponding to the multiple segmented words in same dimensions. 7. The terminal according to claim 5 , wherein, said determining an image-text feature vector by joining the image feature vector with the text feature vector comprises: determining a mapped text feature vector and a mapped image feature vector by mapping the text feature vector and the image feature vector in same dimensions; and generating an image-text feature vector by joining the mapped text feature vector with the mapped image feature vector dimensionally. 8. The terminal according to claim 5 , wherein, the program, when executed by the processor, further implements steps of: acquiring sample images; determining a description set based on each sample image, wherein the description set is null in response to that the sample image has no textual information, and the description set comprises segmented words in response to that the sample image has textual information, wherein the segmented words comprises words except stop words in the textual information; determining a text feature subset based on the description set; and determining a text feature set by combining text feature subsets. 9. A non-transitory computer readable storage medium, wherein, the computer readable storage medium stores a program for image classification thereon, wherein the program, when executed by a processor, implements steps of: determining an image feature vector of an image based on a convolutional neural network, wherein the image comprises textual information; determining a text feature vector based on the textual information and an embedded network; determining an image-text feature vector by joining the image feature vector with the text feature vector; determining a first classification result vector corresponding to the image feature vector, a second classification result vector corresponding to the text feature vector, and a third classification result vector corresponding to the image-text feature vector, based on a deep neural network; determining a target result vector by weighting and summing the first classification result vector, the second classification result vector and the third classification result vector; and determining a category of the image based on the target result vector. 10. The non-transitory computer readable storage medium according to claim 9 , wherein said that determining a text feature vector based on the textual information and an embedded network comprises: determining multiple segmented words by removing stop words in the textual information; determining position information of each segmented word in a text feature set; generating an index value of the segmented word based on the position information; determining a description vector corresponding to each segmented word based on the index value and the embedded network; and determining a text feature vector by weighting and averaging description vectors corresponding to the multiple segmented words in same dimensions. 11. The non-transitory computer readable storage medium according to claim 9 , wherein, said determining an image-text feature vector by joining the image feature vector with the text feature vector comprises: determining a mapped text
Lexical analysis, e.g. tokenisation or collocates · CPC title
Combination of methods, e.g. classifiers, working on the same input data · CPC title
Text, e.g. of license plates, overlay texts or captions on TV images · CPC title
of classification results, e.g. where the classifiers operate on the same input data · CPC title
Classification techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.