Techniques for sentiment analysis of data using a convolutional neural network and a co-occurrence network

US2018341839A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018341839-A1
Application numberUS-201815976390-A
CountryUS
Kind codeA1
Filing dateMay 10, 2018
Priority dateMay 26, 2017
Publication dateNov 29, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in the data set. The bigrams can be generated using a co-occurrence graph. The model is updated to include the second plurality of features, and sentiment analysis can be performed on a second data set using the updated model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising, at a computer system: generating a word embedding model comprising a first plurality of features for words; generating an initial matrix, wherein the initial matrix comprises a plurality of rows comprising a plurality of words from a first data set, and a first plurality of columns comprising the first plurality of features from the word embedding model; determining one or more values indicating a measure of sentiment for the plurality of words in the initial matrix in relation to each of the first plurality of features, wherein the one or more values are determined using a convolutional neural network; generating a co-occurrence graph based on the first data set; identifying one or more bigrams in the first data set based on the co-occurrence graph; determining a set of bigrams in the co-occurrence graph having a highest frequency of occurrence; generating an updated matrix to include a second plurality of features corresponding with the set of bigrams; for each of the second plurality of features, generating, in the updated matrix, an indication of an occurrence of a bigram corresponding to each the plurality of words based on the set of bigrams and the co-occurrence graph; and determining a measure of sentiment for a second data set using the updated matrix. 2 . The method according to claim 1 , wherein a features comprises an attribute that represents a word in the first plurality of words. 3 . The method according to claim 1 , wherein the set of bigrams in the co-occurrence graph having the highest frequency of occurrence among the plurality of words from the first data set are assigned a higher weight than bigrams having a lower frequency of occurrence among the plurality of words from the first data set. 4 . The method according to claim 1 , wherein the identifying the one or more bigrams in the first data set comprises: setting a word window size; parsing a first subset of the plurality of words from the first data set within the word window size; and removing one or more stop words from the first subset of the plurality of words in the first data set within the word window size. 5 . The method according to claim 4 , wherein the identifying the one or more bigrams in the first data set further comprises identifying a first pair of words from the first subset within the word window size. 6 . The method according to claim 5 , further comprising assigning a weight score to the first pair of words identified from the first subset. 7 . The method according to claim 6 , further comprising: parsing a second subset of the plurality of words from the first data set within the word window size; and in response to the first pair of words appearing in the second subset of the plurality of words, incrementing the weight score assigned to the first pair of words. 8 . The method according to claim 1 , wherein each of the plurality of rows of initial matrix comprises a single word of the plurality of words, and wherein each of the first plurality of columns of the initial matrix comprises a feature from one of the first plurality of features. 9 . The method according to claim 1 , wherein the updated matrix comprises a second plurality of columns, and wherein each of the second plurality of columns comprises a feature corresponding with the second plurality of features of the set of bigrams. 10 . The method according to claim 1 , wherein the indication of the occurrence of the bigram in the plurality of words comprises a numerical value. 11 . A system comprising: one or more processors; a memory accessible to the one or more processors, the memory comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform a method comprising: generating a word embedding model comprising a first plurality of features for words; generating an initial matrix, wherein the initial matrix comprises a plurality of rows comprising a plurality of words from a first data set, and a first plurality of columns comprising the first plurality of features from the word embedding model; determining one or more values indicating a measure of sentiment for the plurality of words in the matrix in relation to each of the first plurality of features, wherein the one or more values are determined using a convolutional neural network; generating a co-occurrence graph based on the first data set; identifying one or more bigrams in the first data set based on the co-occurrence graph; determining a set of bigrams in the co-occurrence graph having a highest frequency of occurrence; generating an updated matrix to include a second plurality of features corresponding with the set of bigrams; for each of the second plurality of features, generating, in the updated matrix, an indication of an occurrence of a bigram corresponding to each the plurality of words based on the set of bigrams and the co-occurrence graph; and determining a measure of sentiment for a second data set using the updated matrix. 12 . The system according to claim 11 , wherein the identifying the one or more bigrams in the first data set comprises: setting a word window size; parsing a first subset of the plurality of words from the first data set within the word window size; and removing one or more stop words from the first subset of the plurality of words in the first data set within the word window size. 13 . The system according to claim 12 , wherein the identifying the one or more bigrams in the first data set further comprises identifying a first pair of words from the first subset within the word window size. 14 . The system according to claim 13 , further comprising assigning a weight score to the first pair of words identified from the first subset. 15 . The system according to claim 14 , further comprising: parsing a second subset of the plurality of words from the first data set within the word window size; and in response to the first pair of words appearing in the second subset of the plurality of words, incrementing the weight score assigned to the first pair of words. 16 . A non-transitory computer readable medium storing a plurality of instructions for controlling a computer system to perform a method comprising: generating a word embedding model comprising a first plurality of features for words; generating an initial matrix, wherein the initial matrix comprises a plurality of rows comprising a plurality of words from a first data set, and a first plurality of columns comprising the first plurality of features from the word embedding model; determining one or more values indicating a measure of sentiment for the plurality of words in the matrix in relation to each of the first plurality of features, wherein the one or more values are determined using a convolutional neural network; generating a co-occurrence graph based on the first data set; identifying one or more bigrams in the first data set based on the co-occurrence graph; determining a set of bigrams in the co-occurrence graph having a highest frequency of occurrence; generating an updated matrix to include a second plurality of features corresponding with the set of bigrams; for each of the second plurality of features, generating, in the updated matrix, an indication of an occurrence of a bigram corresponding to each the plurality of words based on the set of bigrams and the co-occurrence graph; and determining a measure of sentiment for a second data set using the updated matrix. 17 . The comput

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • Syntactic or semantic context, e.g. balancing · CPC title

  • G06F40/205Primary

    Parsing · CPC title

  • based on distances to training or reference patterns · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018341839A1 cover?
Techniques are provided for performing sentiment analysis on words in a first data set. An example embodiment includes generating a word embedding model including a first plurality of features. A value indicating sentiment for the words in the first data set can be determined using a convolutional neural network (CNN). A second plurality of features are generated based on bigrams identified in …
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/205. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 29 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).