Categorizing hash tags
US-2015220615-A1 · Aug 6, 2015 · US
US2016019659A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016019659-A1 |
| Application number | US-201514748507-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 24, 2015 |
| Priority date | Jul 15, 2014 |
| Publication date | Jan 21, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and methods are provided for identifying conversations in tweet streams. A method includes grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent. The method further includes splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages. The method also includes clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders. The method additionally includes merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists. Each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged.
Opening claim text (preview).
What is claimed is: 1 . A method for identifying conversations in tweet streams, comprising: grouping tweet messages in the tweet streams into tweet groups, responsive to hashtags therefor and time intervals in which the tweet message were sent; splitting the tweet groups into subgroups responsive to secondary hashtags and a time separation between the tweets messages; clustering any of the subgroups into a respective same conversation responsive to word occurrences, word frequencies, and account holders; and merging any of the subgroups having different hashtags into the respective same conversation responsive to overlapping glossary and account lists, wherein each of the tweet groups and each of the subgroups correspond to a respective different one of the conversations when unable to be split, clustered, or merged. 2 . The method of claim 1 , wherein the tweet groups are split into the subgroups, when the time separation between the tweets messages is greater than a predetermined amount of time. 3 . The method of claim 2 , wherein, irrespective of having a same hashtag, the tweets messages in the tweet groups split into the subgroups responsive to the time separation between the tweet messages being greater than the predetermined amount of time are considered to belong to different conversations. 4 . A method for predicting the business impact of input tweet conversations, comprising: creating training data that includes pre-selected tweet conversations, pre-selected hashtags from the pre-selected tweet conversations, and labels, each of the labels specifying a respective predicted business impact level for a respective one of the pre-selected tweet conversations and a respective one of the pre-selected hashtags included therein; computing, by a processor, feature vectors for features extracted from the input tweet conversations; and forming a prediction model, trained by the training data, for predicting a respective business impact level for each of the input tweet conversations, by mapping respective predicted business impact levels to one or more feature vectors of each of the input tweet conversations. 5 . The method of claim 4 , wherein said creating step is performed off-line. 6 . The method of claim 4 , wherein the corresponding business impacts included in the training data are expert-predicted business impacts. 7 . The method of claim 4 , further comprising initially grouping the input tweet conversations into groups of input tweet conversations, respective group memberships being based on having a respective same hashtag. 8 . The method of claim 4 , further comprising initially selecting the features for which the feature vectors are computed responsive to a measure of independence between observed feature values and expected frequencies of the observed feature value. 9 . The method of claim 8 , wherein the measure of independence is calculated under a null hypothesis that feature values are independent of an impact level. 10 . The method of claim 9 , wherein the measure of independence is calculated responsive to performing Pearson's chi-square test under the null hypothesis. 11 . The method of claim 4 , wherein the features comprise at least one of account features, keyword features, location features, language features, and time features. 12 . The method of claim 4 , wherein the feature weight vectors are calculated to minimize a prediction error of the business impact level responsive to the training data. 13 . The method of claim 4 , further comprising calculating feature weight vectors for the features, wherein an impact score used for predicting the business impact level for each of the input tweet conversations is determined responsive to the feature vectors and the feature weight vectors corresponding thereto. 14 . The method of claim 13 , wherein said calculating step comprises retrieving one or more feature weight values from a weight-to-hashtag data association construct that respectively associates different hashtags to respective feature weight values. 15 . The method of claim 4 , wherein the business impact level is predicted using a binary specifier, the binary specified being selected from a value of high and a value of low. 16 . The method of claim 4 , wherein the prediction model predicts the business impact level for each of the input tweet conversations using logistic regression.
Business processes related to social networking or social networking services · CPC title
Market predictions or forecasting for commercial activities · CPC title
with management of multicast group membership · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.