Trending topic extraction from social media
US-10095686-B2 · Oct 9, 2018 · US
US2019102374A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019102374-A1 |
| Application number | US-201715723095-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 2, 2017 |
| Priority date | Oct 2, 2017 |
| Publication date | Apr 4, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A prediction system can predict future trending topics. The prediction system can classify social media posts by region and vertical, extract text from the posts, tokenize the extracted text, and organizing the tokens into n-grams. The prediction system can store the n-grams from the posts in a cumulative set of n-grams, with each n-gram tagged with the originating post's identified region, vertical, and a time value. The prediction system can compute, for each n-gram, a frequency within each category defined by a region/vertical pair. The prediction system an fit occurrence data for n-grams to a polynomial and identify the slope of the point on for the current time. The slope can be used as a prediction of growth or decline for the n-gram. The prediction system can identify n-grams with a comparatively large slope within that region/vertical as likely to be trending in the future.
Opening claim text (preview).
1 . A method for identifying future trending n-grams, comprising: for each particular content item of multiple content items: extracting text from the particular content item; identifying one or more classifications for the particular content item; organizing the extracted text into one or more n-grams; adding the one or more n-grams to a cumulative set of n-grams, wherein each n-gram in the cumulative set is associated with a time-based value for the particular content item; sorting the n-grams in the cumulative set into groups by the one or more classifications of the content item that the n-gram originated from; computing a frequency value, within each group, for each unique n-gram in that group; selecting unique n-grams, for at least one of the groups, that have a frequency value above a frequency threshold; computing a predicted change in frequency value for the selected unique n-grams, wherein the predicted change in frequency value is based on the time-based values for the n-grams that have the same sequence of words as the unique n-gram and that are in the same group as the unique n-gram; and selecting, as the future trending n-grams, one or more n-grams with a predicted change in frequency value above a predicted change threshold. 2 . The method of claim 1 , wherein identifying the one or more classifications for the particular content item comprises identifying a geographical region for the content item. 3 . The method of claim 2 , wherein the geographical region for the content item is identified based on region data for a user who provided the content item or for a device the content item originated from. 4 . The method of claim 1 , wherein identifying the one or more classifications for the particular content item comprises identifying a vertical for the content item based on the extracted text from the particular content item. 5 . The method of claim 1 , wherein extracting text from the particular content item comprises one or more of: converting audio associated with the particular content item to text; performing text recognition on an image associated with the particular content item; performing text recognition on video associated with the particular content item. 6 . The method of claim 1 , wherein organizing the extracted text into one or more n-grams comprises: normalizing the extracted text; tokenizing the normalized text; and grouping the tokenized text into groups of sequential tokens, the groups having a fixed number of tokens. 7 . The method of claim 6 , wherein the fixed number of tokens is two tokens. 8 . The method of claim 6 , wherein at least two of the groups of sequential tokens are overlapping in the normalized text. 9 . The method of claim 1 further comprising: identifying at least one invalid n-gram, wherein each particular invalid n-gram is identified as invalid based on an amount of words, of the particular invalid n-gram that match words on a pre-defined stop word list, being above a stop-word threshold; and removing from the cumulative set the identified invalid n-grams. 10 . A computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform operations for identifying one or more future trending n-grams, the operations comprising: for each particular content item of multiple content items: identifying one or more classifications for the particular content item; organizing text associated with the particular content item into one or more n-grams; adding, from the one or more n-grams into to a cumulative set of n-grams, at least one n-gram; computing a frequency value for each unique n-gram in the cumulative set of n-grams, the frequency value computed for a frequency within the group of n-grams in the cumulative set of n-grams that have the same one or more classifications; computing a predicted change in frequency value for at least some of the unique n-grams, wherein the predicted change in frequency value is based on time-based values associated the n-grams in the cumulative set that have the same sequence of words as the unique n-gram and that have the same one or more classifications as the unique n-gram; and selecting, as the future trending n-grams, one or more n-grams with a predicted change in frequency value above a predicted change threshold. 11 . The computer-readable storage medium of claim 10 , wherein computing the predicted change in frequency value for the at least some of the unique n-grams is performed, for each particular unique n-gram, by: fitting a polynomial to the time-based values for the n-grams that have the same sequence of words as the particular unique n-gram and have the same one or more classifications as the particular unique n-gram; and computing the predicted change in frequency value as a slope of the polynomial at a point corresponding to a current time. 12 . The computer-readable storage medium of claim 10 , wherein at least one of the one or more classifications for each particular content item is identified by performing natural language topic recognition on the text associated with the particular content item. 13 . The computer-readable storage medium of claim 10 , wherein the operations further comprise: receiving an indication of user input choosing a selected region and a selected vertical; and in response to the indication of user input, providing a subset of the selected future trending n-grams whose one or more classifications include both a region classification matching the selected region and a vertical classification matching the selected vertical. 14 . The computer-readable storage medium of claim 13 , wherein at least one chosen n-gram of the provided subset of future trending n-grams is used to generate marketing materials prior to the chosen n-gram reaching a peak in trending among users of a social media system. 15 . The computer-readable storage medium of claim 10 , wherein the operations further comprise selecting the at least some of the unique n-grams to be used in predicting a change frequency by: selecting unique n-grams, for at least one of the groups, that have a frequency value above a frequency threshold. 16 - 20 . (canceled) 21 . The computer-readable storage medium of claim 10 , wherein identifying the one or more classifications for the particular content item comprises identifying a geographical region for the content item. 22 . The computer-readable storage medium of claim 10 , wherein identifying the one or more classifications for the particular content item comprises identifying a vertical for the content item based on text associated with the particular content item. 23 . The computer-readable storage medium of claim 10 , wherein the operations further comprise extracting text from each particular content item, extracting the text from each particular content item including one or more of: converting audio associated with the particular content item to text; performing text recognition on an image associated with the particular content item; or performing text recognition on video associated with the particular content item. 24 . The computer-readable storage medium of claim 23 , wherein organizing the text into one or more n-grams comprises: normalizing the text; tokenizing the normalized text; and grouping the tokenized text into groups of sequential tokens, the groups having a fixed number of tokens. 25 . The computer-readable storage med
Lexical analysis, e.g. tokenisation or collocates · CPC title
Business processes related to social networking or social networking services · CPC title
Classification techniques · CPC title
Semantic analysis · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.