System and method for topic extraction and opinion mining

US10339184B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10339184-B2
Application numberUS-201615357018-A
CountryUS
Kind codeB2
Filing dateNov 21, 2016
Priority dateSep 28, 2009
Publication dateJul 2, 2019
Grant dateJul 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for topic extraction and opinion mining are described. For example, a machine accesses a document from a document record of a database. The machine builds a syntax tree of a sentence of the document based on parsing the sentence. The machine assigns a polarity impact to one or more words of the sentence. The machine determines that a plurality of words in the sentence have conflicting polarity based on the syntax tree and the polarity impact of the plurality of words. The machine determines a classification of the sentence based on the determining that the plurality of words in the sentence have conflicting polarity. The machine generates a classification record of the classification in the database.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more hardware processors; and a machine-readable medium for storing instructions that, when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising: accessing a document from a document record of a database; building a syntax tree of a sentence of the document based on parsing the sentence; assigning a polarity impact to one or more words of the sentence based on matching one or more lexical patterns and the one or more words of the sentence, a lexical pattern being a token sequence, each token of the token sequence comprising a lemma field, a polarity tag field, and a part-of-speech tag field; determining that a plurality of words in the sentence have conflicting polarity based on the syntax tree and the assigned polarity impact of the plurality of words; determining a classification of the sentence based on the determining that the plurality of words in the sentence have conflicting polarity; and generating a classification record of the classification in the database. 2. The system of claim 1 , wherein the operations further comprise identifying one or more key phrases in the document based on a filtering technique. 3. The system of claim 2 , wherein the operations further comprise identifying the sentence of the document based on the identified one or more key phrases, the sentence including the one or more key phrases. 4. The system of claim 1 , wherein the assigning of the polarity impact to the one or more words of the sentence is further based on a polarity word having a dominant impact on a topic. 5. The system of claim 1 , wherein the assigning of the polarity impact to the one or more words of the sentence is further based on a sum of polarities method. 6. The system of claim 1 , wherein the assigning of the polarity impact to the one or more words of the sentence is further based on a syntactic distance between the one or more words and a topic key phrase in the syntactic tree. 7. The system of claim 1 , wherein the determining of the classification of the sentence includes selecting a polarity classification for the one or more words in the sentence, the selecting resulting in classified one or more words. 8. The system of claim 7 , wherein the determining of the classification of the sentence further includes applying one or more heuristic rules to the classified one or more words and the sentence. 9. The system of claim 1 , wherein the operations further comprise determining an opinion associated with a document that includes the sentence based on the classification of the sentence. 10. A method comprising: accessing a document from a document record of a database; building, using one or more hardware processors, a syntax tree of a sentence of the document based on parsing the sentence; assigning a polarity impact to one or more words of the sentence based on matching one or more lexical patterns and the one or more words of the sentence, a lexical pattern being a token sequence, each token of the token sequence comprising a lemma field, a polarity tag field, and a part-of-speech tag field; determining that a plurality of words in the sentence have conflicting polarity based on the syntax tree and the assigned polarity impact of the plurality of words; determining a classification of the sentence based on the determining that the plurality of words in the sentence have conflicting polarity; and generating a classification record of the classification in the database. 11. The method of claim 10 , further comprising: identifying one or more key phrases in the document based on a filtering technique. 12. The method of claim 11 , further comprising: identifying the sentence of the document based on the identified one or more key phrases, the sentence including the one or more key phrases. 13. The method of claim 10 , wherein the assigning of the polarity impact to the one or more words of the sentence is further based on a polarity word having a dominant impact on a topic. 14. The method of claim 10 , wherein the assigning of the polarity impact to the one or more words of the sentence is further based on a sum of polarities method. 15. The method of claim 10 , wherein the assigning of the polarity impact to the one or more words of the sentence is further based on a syntactic distance between the one or more words and a topic key phrase in the syntactic tree. 16. The method of claim 10 , wherein the determining of the classification of the sentence includes selecting a polarity classification for the one or more words in the sentence, the selecting resulting in classified one or more words. 17. The method of claim 16 , wherein the determining of the classification of the sentence further includes applying one or more heuristic rules to the classified one or more words and the sentence. 18. A non-transitory machine-readable storage medium comprising instructions that, when executed by one or more hardware processors of a machine, cause the one or more hardware processors to perform operations comprising: accessing a document from a document record of a database; building a syntax tree of a sentence of the document based on parsing the sentence; assigning a polarity impact to one or more words of the sentence based on matching one or more lexical patterns and the one or more words of the sentence, a lexical pattern being a token sequence, each token of the token sequence comprising a lemma field, a polarity tag field, and a part-of-speech tag field; determining that a plurality of words in the sentence have conflicting polarity based on the syntax tree and the assigned polarity impact of the plurality of words; determining a classification of the sentence based on the determining that the plurality of words in the sentence have conflicting polarity; and generating a classification record of the classification in the database.

Assignees

Inventors

Classifications

  • Semantic analysis · CPC title

  • using natural language analysis · CPC title

  • G06F16/93Primary

    Document management systems · CPC title

  • using extracted text · CPC title

  • Creation of semantic tools, e.g. ontology or thesauri · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10339184B2 cover?
Techniques for topic extraction and opinion mining are described. For example, a machine accesses a document from a document record of a database. The machine builds a syntax tree of a sentence of the document based on parsing the sentence. The machine assigns a polarity impact to one or more words of the sentence. The machine determines that a plurality of words in the sentence have conflictin…
Who is the assignee on this patent?
Ebay Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/93. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).