Predicting the business impact of tweet conversations

US2016019659A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016019659-A1
Application numberUS-201514748507-A
CountryUS
Kind codeA1
Filing dateJun 24, 2015
Priority dateJul 15, 2014
Publication dateJan 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for identifying conversations in tweet streams, comprising: grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent; splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages; clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders; and merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists, wherein each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged. 2 . The method of claim 1 , wherein the tweet groups are split into the subgroups, when the time separation between the tweets messages is greater than a predetermined amount of time. 3 . The method of claim 2 , wherein, irrespective of having a same hashtag, the tweets messages in the tweet groups split into the subgroups responsive to the time separation between the tweet messages being greater than the predetermined amount of time are considered to belong to different conversations. 4 . A method for predicting the business impact of input tweet conversations, comprising: creating training data that includes pre-selected tweet conversations, pre-selected hashtags from the pre-selected tweet conversations, and labels, each of the labels specifying a respective predicted business impact level for a respective one of the pre-selected tweet conversations and a respective one of the pre-selected hashtags included therein; computing, by a processor, feature vectors for features extracted from the input tweet conversations; and forming a prediction model, trained by the training data, for predicting a respective business impact level for each of the input tweet conversations, by mapping respective predicted business impact levels to one or more feature vectors of each of the input tweet conversations. 5 . The method of claim 4 , wherein said creating step is performed off-line. 6 . The method of claim 4 , wherein the corresponding business impacts included in the training data are expert-predicted business impacts. 7 . The method of claim 4 , further comprising initially grouping the input tweet conversations into groups of input tweet conversations, respective group memberships being based on having a respective same hashtag. 8 . The method of claim 4 , further comprising initially selecting the features for which the feature vectors are computed responsive to a measure of independence between observed feature values and expected frequencies of the observed feature value. 9 . The method of claim 8 , wherein the measure of independence is calculated under a null hypothesis that feature values are independent of an impact level. 10 . The method of claim 9 , wherein the measure of independence is calculated responsive to performing Pearson's chi-square test under the null hypothesis. 11 . The method of claim 4 , wherein the features comprise at least one of account features, keyword features, location features, language features, and time features. 12 . The method of claim 4 , wherein the feature weight vectors are calculated to minimize a prediction error of the business impact level responsive to the training data. 13 . The method of claim 4 , further comprising calculating feature weight vectors for the features, wherein an impact score used for predicting the business impact level for each of the input tweet conversations is determined responsive to the feature vectors and the feature weight vectors corresponding thereto. 14 . The method of claim 13 , wherein said calculating step comprises retrieving one or more feature weight values from a weight-to-hashtag data association construct that respectively associates different hashtags to respective feature weight values. 15 . The method of claim 4 , wherein the business impact level is predicted using a binary specifier, the binary specified being selected from a value of high and a value of low. 16 . The method of claim 4 , wherein the prediction model predicts the business impact level for each of the input tweet conversations using logistic regression.

Assignees

Inventors

Classifications

  • Business processes related to social networking or social networking services · CPC title

  • Market predictions or forecasting for commercial activities · CPC title

  • with management of multicast group membership · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016019659A1 cover?
A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06Q30/0202. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).