Trending topic extraction from social media

US2016292157A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016292157-A1
Application numberUS-201514679736-A
CountryUS
Kind codeA1
Filing dateApr 6, 2015
Priority dateApr 6, 2015
Publication dateOct 6, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Real-time topic analysis for social listening is performed to help users and organizations in discovering and understanding trending topics in varying degrees of granularity. A density-based sampling method is employed to reduce data input. A lightweight NLP method is utilized for topic extraction which provides an efficient mechanism for handling dynamically-changing content. In embodiments, the social analytics system further helps users understand the topics by ranking topics by relevance, labeling topic categories, and grouping semantically-similar topics.

First claim

Opening claim text (preview).

What is claimed is: 1 . A non-transitory computer storage medium storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising: retrieving data, via a social engine, from one or more social media streams, the one or more social media streams sampled in accordance with a user selection received via a user device; utilizing natural language processing, at a trending topic tool, to identify candidate topics of the data; ranking the candidate topics, at the trending topic tool, with a relevance score to determine trending topics; classifying, at the trending topic tool, the trending topics into categories; and grouping semantically-similar topics, at the trending topic tool, wherein the semantically-similar topics provide a user, via the user device, with a real-time understanding of social media, in accordance with the user selection. 2 . The non-transitory computer storage medium of claim 1 , wherein the user selection is a time constraint. 3 . The non-transitory computer storage medium of claim 1 , wherein a fixed amount of data is sampled in accordance with the user selection. 4 . The non-transitory computer storage medium of claim 1 , wherein the user selection specifies the data is sampled from all users or popular users. 5 . The non-transitory computer storage medium of claim 1 , further comprising ranking users contributing to the sampled data to identify popular users. 6 . The non-transitory computer storage medium of claim 6 , wherein each user is ranked by multiplying the number of followers for the user by the logarithm of the number of posts for the account of the user. 7 . The non-transitory computer storage medium of claim 1 , wherein the data is retrieved by: splitting the one or more social media streams into bins, the bins being split in accordance with equal time lengths per bin; calculating a number of posts in each bin; determining an expected sample count for each bin by dividing the number of posts for each bin by a total number of posts in all bins, and multiplying by a total expected sample count; and retrieving the data in accordance with the expected sample count for each bin. 8 . The non-transitory computer storage medium of claim 1 , wherein utilizing natural language processing to identify candidate topics comprises: identifying words and phrases as candidates based on a part-of-speech (POS) tag being a proper noun, a plural proper noun, or a cardinal number; and defining topic boundaries that belong to each candidate. 9 . The non-transitory computer storage medium of claim 1 , wherein ranking the candidate topics with a relevance score comprises: determining an Accumulated Term Frequency (ATF) for a candidate topic in a document of the data, the ATF not being a term frequency for the candidate topic in the document; determining an Inverse Document Frequency (IDF) for the candidate topic in the data; and determining the relevance score for the candidate topic. 10 . The non-transitory computer storage medium of claim 1 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organizations, person names, and locations; and utilizing dictionary sources to classify unknown topics. 11 . A computer-implemented method comprising: determining, via a first computing process, an Accumulated Term Frequency (ATF) for each candidate topic identified in a data sample retrieved from one or more social media streams via a social engine; determining, via a second computing process, the inverse document frequency for each candidate topic in the data sample; and determining, via a third computing process, a relevance score for each candidate topic to determine trending topics, wherein the trending topics provide a user, via a user device, with a real-time understanding of social media, in accordance with a user selection received from the user device; wherein each of the computing processes is performed by one or more computing devices. 12 . The computer-implemented method of claim 11 , further comprising, utilizing, via a fourth computing process, natural language processing to identify candidate topics of the data sample. 13 . The computer-implemented method of claim 12 , wherein utilizing natural language processing to identify candidate topics of the data sample comprises: identifying words and phrases as candidates for the candidate topics based on a part-of-speech (POS) tag being a proper noun, a plural proper noun, or a cardinal number; defining topic boundaries that belong to each candidate topic to identify the start and end of each candidate topic; and extracting each candidate topic. 14 . The computer-implemented method of claim 11 , further comprising, classifying, via a fifth computing process, the trending topics into categories. 15 . The computer-implemented method of claim 14 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organization, person, and location; and utilizing dictionary sources to classify unknown topics. 16 . The computer-implemented method of claim 11 , further comprising grouping, via a sixth computing process, semantically-similar topics. 17 . The computer-implemented method of claim 11 , wherein the one or more social media streams are sampled in accordance with a user selection of a data source, a time constraint, desired demographics, a product, a service, a feature, an organization, a person, or a location. 18 . The computer-implemented method of claim 11 , further comprising ranking users contributing to the sampled data to identify popular users, wherein each user is ranked by multiplying the number of followers for the user by the logarithm of the number of posts for the account of the user. 19 . The computer-implemented method of claim 11 , wherein the data is retrieved by: splitting the one or more social media streams into bins, the bins being split in accordance with equal time lengths per bin; calculating a number of posts in each bin; determining an expected sample count for each bin by dividing the number of posts for each bin by a total number of posts in all bins, and multiplying by a total expected sample count; and retrieving the data in accordance with the expected sample count for each bin. 20 . A computerized system comprising: one or more processors; and a non-transitory computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to: retrieve data, via a social engine, from one or more social media streams, the one or more social media streams sampled in accordance with a user selection received via a user

Assignees

Inventors

Classifications

  • Business processes related to social networking or social networking services · CPC title

  • using natural language analysis · CPC title

  • Recognition of textual entities · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016292157A1 cover?
Real-time topic analysis for social listening is performed to help users and organizations in discovering and understanding trending topics in varying degrees of granularity. A density-based sampling method is employed to reduce data input. A lightweight NLP method is utilized for topic extraction which provides an efficient mechanism for handling dynamically-changing content. In embodiments, t…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/3344. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).