Predicting future trending topics

US2019102374A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019102374-A1
Application numberUS-201715723095-A
CountryUS
Kind codeA1
Filing dateOct 2, 2017
Priority dateOct 2, 2017
Publication dateApr 4, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A prediction system can predict future trending topics. The prediction system can classify social media posts by region and vertical, extract text from the posts, tokenize the extracted text, and organizing the tokens into n-grams. The prediction system can store the n-grams from the posts in a cumulative set of n-grams, with each n-gram tagged with the originating post's identified region, vertical, and a time value. The prediction system can compute, for each n-gram, a frequency within each category defined by a region/vertical pair. The prediction system an fit occurrence data for n-grams to a polynomial and identify the slope of the point on for the current time. The slope can be used as a prediction of growth or decline for the n-gram. The prediction system can identify n-grams with a comparatively large slope within that region/vertical as likely to be trending in the future.

First claim

Opening claim text (preview).

1 . A method for identifying future trending n-grams, comprising: for each particular content item of multiple content items: extracting text from the particular content item; identifying one or more classifications for the particular content item; organizing the extracted text into one or more n-grams; adding the one or more n-grams to a cumulative set of n-grams, wherein each n-gram in the cumulative set is associated with a time-based value for the particular content item; sorting the n-grams in the cumulative set into groups by the one or more classifications of the content item that the n-gram originated from; computing a frequency value, within each group, for each unique n-gram in that group; selecting unique n-grams, for at least one of the groups, that have a frequency value above a frequency threshold; computing a predicted change in frequency value for the selected unique n-grams, wherein the predicted change in frequency value is based on the time-based values for the n-grams that have the same sequence of words as the unique n-gram and that are in the same group as the unique n-gram; and selecting, as the future trending n-grams, one or more n-grams with a predicted change in frequency value above a predicted change threshold. 2 . The method of claim 1 , wherein identifying the one or more classifications for the particular content item comprises identifying a geographical region for the content item. 3 . The method of claim 2 , wherein the geographical region for the content item is identified based on region data for a user who provided the content item or for a device the content item originated from. 4 . The method of claim 1 , wherein identifying the one or more classifications for the particular content item comprises identifying a vertical for the content item based on the extracted text from the particular content item. 5 . The method of claim 1 , wherein extracting text from the particular content item comprises one or more of: converting audio associated with the particular content item to text; performing text recognition on an image associated with the particular content item; performing text recognition on video associated with the particular content item. 6 . The method of claim 1 , wherein organizing the extracted text into one or more n-grams comprises: normalizing the extracted text; tokenizing the normalized text; and grouping the tokenized text into groups of sequential tokens, the groups having a fixed number of tokens. 7 . The method of claim 6 , wherein the fixed number of tokens is two tokens. 8 . The method of claim 6 , wherein at least two of the groups of sequential tokens are overlapping in the normalized text. 9 . The method of claim 1 further comprising: identifying at least one invalid n-gram, wherein each particular invalid n-gram is identified as invalid based on an amount of words, of the particular invalid n-gram that match words on a pre-defined stop word list, being above a stop-word threshold; and removing from the cumulative set the identified invalid n-grams. 10 . A computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform operations for identifying one or more future trending n-grams, the operations comprising: for each particular content item of multiple content items: identifying one or more classifications for the particular content item; organizing text associated with the particular content item into one or more n-grams; adding, from the one or more n-grams into to a cumulative set of n-grams, at least one n-gram; computing a frequency value for each unique n-gram in the cumulative set of n-grams, the frequency value computed for a frequency within the group of n-grams in the cumulative set of n-grams that have the same one or more classifications; computing a predicted change in frequency value for at least some of the unique n-grams, wherein the predicted change in frequency value is based on time-based values associated the n-grams in the cumulative set that have the same sequence of words as the unique n-gram and that have the same one or more classifications as the unique n-gram; and selecting, as the future trending n-grams, one or more n-grams with a predicted change in frequency value above a predicted change threshold. 11 . The computer-readable storage medium of claim 10 , wherein computing the predicted change in frequency value for the at least some of the unique n-grams is performed, for each particular unique n-gram, by: fitting a polynomial to the time-based values for the n-grams that have the same sequence of words as the particular unique n-gram and have the same one or more classifications as the particular unique n-gram; and computing the predicted change in frequency value as a slope of the polynomial at a point corresponding to a current time. 12 . The computer-readable storage medium of claim 10 , wherein at least one of the one or more classifications for each particular content item is identified by performing natural language topic recognition on the text associated with the particular content item. 13 . The computer-readable storage medium of claim 10 , wherein the operations further comprise: receiving an indication of user input choosing a selected region and a selected vertical; and in response to the indication of user input, providing a subset of the selected future trending n-grams whose one or more classifications include both a region classification matching the selected region and a vertical classification matching the selected vertical. 14 . The computer-readable storage medium of claim 13 , wherein at least one chosen n-gram of the provided subset of future trending n-grams is used to generate marketing materials prior to the chosen n-gram reaching a peak in trending among users of a social media system. 15 . The computer-readable storage medium of claim 10 , wherein the operations further comprise selecting the at least some of the unique n-grams to be used in predicting a change frequency by: selecting unique n-grams, for at least one of the groups, that have a frequency value above a frequency threshold. 16 - 20 . (canceled) 21 . The computer-readable storage medium of claim 10 , wherein identifying the one or more classifications for the particular content item comprises identifying a geographical region for the content item. 22 . The computer-readable storage medium of claim 10 , wherein identifying the one or more classifications for the particular content item comprises identifying a vertical for the content item based on text associated with the particular content item. 23 . The computer-readable storage medium of claim 10 , wherein the operations further comprise extracting text from each particular content item, extracting the text from each particular content item including one or more of: converting audio associated with the particular content item to text; performing text recognition on an image associated with the particular content item; or performing text recognition on video associated with the particular content item. 24 . The computer-readable storage medium of claim 23 , wherein organizing the text into one or more n-grams comprises: normalizing the text; tokenizing the normalized text; and grouping the tokenized text into groups of sequential tokens, the groups having a fixed number of tokens. 25 . The computer-readable storage med

Assignees

Inventors

Classifications

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06Q10/40Primary

    Business processes related to social networking or social networking services · CPC title

  • Classification techniques · CPC title

  • Semantic analysis · CPC title

  • G06F17/277Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019102374A1 cover?
A prediction system can predict future trending topics. The prediction system can classify social media posts by region and vertical, extract text from the posts, tokenize the extracted text, and organizing the tokens into n-grams. The prediction system can store the n-grams from the posts in a cumulative set of n-grams, with each n-gram tagged with the originating post's identified region, ver…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 04 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).