Performing sentiment analysis on microblogging data, including identifying a new opinion term therein

US9275041B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9275041-B2
Application numberUS-201113280031-A
CountryUS
Kind codeB2
Filing dateOct 24, 2011
Priority dateOct 24, 2011
Publication dateMar 1, 2016
Grant dateMar 1, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided a computer-implemented method of performing sentiment analysis. An exemplary method comprises performing a first sentiment analysis on microblogging data based on a method using an opinion lexicon. The method also includes training a classifier using training data from the first sentiment analysis. Additionally, the method includes identifying a new opinion term in the microblogging data by performing a statistical test. The new opinion terms are not in the opinion lexicon. The method also includes identifying new microblogging data based on the new opinion term. Further, the method includes performing a second sentiment analysis on the new microblogging data using the classifier.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of performing sentiment analysis, comprising: performing a first sentiment analysis on microblogging data based on a method using an opinion lexicon that includes non-domain-specific opinion terms, to generate first training data; training a classifier using the first training data; identifying a new opinion term in the microblogging data, by performing a statistical test on results of the first sentiment analysis on the microblogging data, wherein the new opinion term is domain-specific to the microblogging data and is not in the opinion lexicon; adding the new opinion term to the opinion lexicon to grow the opinion lexicon so that the opinion lexicon includes domain-specific opinion terms; identifying new microblogging data other than the microblogging data on which the first sentiment analysis has been performed, based on the opinion lexicon to which the new opinion term has been added; performing a second sentiment analysis on the new microblogging data using the classifier to generate second training data; and retraining the classifier as has been trained using the first training data, using the second training data to improve the classifier. 2. The method recited in claim 1 , wherein training the classifier comprises: selecting a text window comprising an entity of a microblog; selecting one or more features from the text window; and training the classifier to determine a sentiment polarity for the entity. 3. The method recited in claim 2 , wherein the text window comprises one or more surrounding words associated with the entity, and wherein the selected features are selected from the surrounding words. 4. The method recited in claim 1 , wherein the statistical test comprises a Pearsons chi-square method, and wherein the opinion term comprises a Pearsons chi-square value greater than a specified threshold. 5. The method recited in claim 1 , wherein the classifier comprises a support vector machine. 6. A computer system for performing sentiment analysis, the computer system comprising: a processor that is adapted to execute stored instructions; and a memory device that stores instructions, the memory device comprising: computer-implemented code adapted to perform a first sentiment analysis on microblogging data based on an opinion lexicon that includes non-domain-specific opinion terms, to generate first training data; computer-implemented code adapted to train a classifier using the first training data; computer-implemented code adapted to perform a statistical test on results of the first sentiment analysis on the microblogging data to identify a new opinion term, the new opinion term being domain-specific to the microblogging data; computer-implemented code adapted to add the new opinion term to the opinion lexicon to grow the opinion lexicon so that the opinion lexicon includes domain-specific opinion terms; computer-implemented code adapted to identify new microblogging data other than the microblogging data on which the first sentiment analysis has been performed, based on the opinion lexicon to which the new opinion term has been added; computer-implemented code adapted to perform a second sentiment analysis on the new microblogging data using the classifier to generate second training data; and computer-implemented code adapted to retrain the classifier as has been trained using the first training data, using the second training data to improve the classifier. 7. The computer system recited in claim 6 , wherein the computer-implemented code adapted to train the classifier comprises computer-implemented code adapted to: identify the new opinion term and microblogging data based on a Pearsons chi-square value; select a text window comprising an entity of the new microblogging data; select one or more features from the text window; and train the classifier to determine a sentiment polarity for the entity. 8. The computer system recited in claim 7 , wherein the new opinion terms comprise a Pearsons chi-square value greater than a specified threshold. 9. The computer system recited in claim 7 , wherein the text window comprises one or more surrounding words associated with the topic, and wherein the selected features are selected from the surrounding words. 10. The computer system recited in claim 6 , wherein the classifier comprises a support vector machine. 11. The computer system recited in claim 6 , further comprising computer-implemented code adapted to: train a classifier using training data from the second sentiment analysis; identify additional microblogging data using the statistical test; and perform a third sentiment analysis on the additional microblogging data using the classifier. 12. A non-transitory machine-readable medium that stores machine-readable instructions executable by a processor to perform sentiment analysis, the machine-readable instructions comprising: machine-readable instructions that, when executed by the processor, perform a first sentiment analysis on microblogging data based on an opinion lexicon that includes non-domain-specific opinion terms, to generate first training data; machine-readable instructions that, when executed by the processor, train a classifier using the first training data; machine-readable instructions that, when executed by the processor, identify a new opinion term in the microblogging data, by at least performing a statistical test on results of the first sentiment analysis on the microblogging data, the new opinion term being domain-specific to the microblogging data; machine-readable instructions that, when executed by the processor, add the new opinion term to the opinion lexicon to grow the opinion lexicon so that the opinion lexicon includes domain-specific opinion terms; machine-readable instructions that, when executed by the processor, identify new microblogging data other than the microblogging data on which the first sentiment analysis has been performed, based on the opinion lexicon to which the new opinion term has been added; machine-readable instructions that, when executed by the processor, perform a second sentiment analysis on the new microblogging data using the classifier to generate second training data; and machine-readable instructions that, when executed by the processor, retrain the classifier as has been trained using the first training data, using the second training data to improve the classifier. 13. The non-transitory machine-readable medium recited in claim 12 , wherein the machine-readable instructions that train the classifier comprise machine-readable instructions that, when executed by the processor: select a text window comprising an entity of the microblogging data; select one or more features from the text window to train the classifier to determine a sentiment polarity for the entity. 14. The non-transitory machine-readable medium recited in claim 13 , wherein the new opinion term has a Pearsons chi-square value greater than a specified threshold. 15. The non-transitory machine-readable medium recited in claim 13 , wherein the text window comprises one or more words associated with the entity, and wherein the selected features are selected from the one or more words.

Assignees

Inventors

Classifications

  • Marketing; Price estimation or determination; Fundraising · CPC title

  • Semantic analysis · CPC title

  • Creation or modification of classes or clusters · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9275041B2 cover?
There is provided a computer-implemented method of performing sentiment analysis. An exemplary method comprises performing a first sentiment analysis on microblogging data based on a method using an opinion lexicon. The method also includes training a classifier using training data from the first sentiment analysis. Additionally, the method includes identifying a new opinion term in the microbl…
Who is the assignee on this patent?
Ghosh Riddhiman, Zhang Lei, Dekhil Mohamed E, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F17/2785. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).