Trending topic extraction from social media

US10095686B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10095686-B2
Application numberUS-201514679736-A
CountryUS
Kind codeB2
Filing dateApr 6, 2015
Priority dateApr 6, 2015
Publication dateOct 9, 2018
Grant dateOct 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Real-time topic analysis for social listening is performed to help users and organizations in discovering and understanding trending topics in varying degrees of granularity. A density-based sampling method is employed to reduce data input. A lightweight NLP method is utilized for topic extraction which provides an efficient mechanism for handling dynamically-changing content. In embodiments, the social analytics system further helps users understand the topics by ranking topics by relevance, labeling topic categories, and grouping semantically-similar topics.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer storage medium storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising: sampling data, via a social engine, from one or more social media streams, in accordance with a user selection received via a user device; assigning part-of-speech (POS) tags to text in the data; applying natural language processing, by a trending topic tool, to extract candidate topics from the data using a first rule comprising: identifying a sequence of a plurality of the assigned POS tags, wherein each POS tag of the sequence is selected from a group consisting of at least one of a proper noun tag, a plural proper noun tag, or a cardinal number tag; defining topic boundaries based on the identified sequence; and extracting a portion of the text corresponding to the topic boundaries as one of the candidate topics; ranking the candidate topics, by the trending topic tool, with a relevance score that quantifies relative importance of each candidate topic to determine trending topics; classifying, by the trending topic tool, the trending topics into categories; grouping the candidate topics into topic clusters of semantically-similar topics, by the trending topic tool, and transmitting the classified and clustered trending topics for display on the user device. 2. The non-transitory computer storage medium of claim 1 , wherein the user selection is a time constraint. 3. The non-transitory computer storage medium of claim 1 , wherein a fixed amount of data is sampled in accordance with the user selection. 4. The non-transitory computer storage medium of claim 1 , wherein the user selection specifies the data is sampled from all users or popular users. 5. The non-transitory computer storage medium of claim 1 , further comprising ranking users contributing to the sampled data to identify popular users, and sampling from the popular users in the one or more social media streams to produce the data. 6. The non-transitory computer storage medium of claim 5 , wherein each user is ranked by multiplying the number of followers for the user by the logarithm of the number of posts for the account of the user. 7. The non-transitory computer storage medium of claim 1 , wherein the data is sampled from the one or more social media streams by: splitting the one or more social media streams into bins, the bins being split in accordance with equal time lengths per bin; calculating, for a given bin, a ratio of posts in the bin to posts in all the bins; determining, for the given bin, an expected sample count by multiplying a defined total number of samples by the ratio for the bin; and sampling, from the given bin, to generate a number of samples corresponding to the expected sample count for the bin. 8. The non-transitory computer storage medium of claim 1 , wherein applying natural language processing to extract candidate topics comprises using a second rule that prohibits using a cardinal number as a first word of a candidate topic unless the cardinal number starts with a letter. 9. The non-transitory computer storage medium of claim 1 , wherein ranking the candidate topics with a relevance score comprises: determining an Accumulated Term Frequency (ATF) for a candidate topic in a document of the data, the ATF counting an occurrence of the candidate topic once for each document in which the candidate topic appears; determining an Inverse Document Frequency (IDF) for the candidate topic in the data; and determining the relevance score for the candidate topic based on the ATF and the IDF for the candidate topic. 10. The non-transitory computer storage medium of claim 1 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organizations, person names, and locations; and utilizing dictionary sources to classify unknown topics. 11. A computer-implemented method comprising: applying, via a first computing process, natural language processing to extract candidate topics from a data sample comprising a plurality of posts retrieved from one or more social media streams via a social engine, the data sample including text with assigned part-of-speech (POS) tags, wherein the first computing process utilizes a first rule comprising: identifying a sequence of a plurality of the assigned POS tags, wherein each POS tag of the sequence is selected from a group consisting of at least one of a proper noun tag, a plural proper noun tag, or a cardinal number tag; defining topic boundaries based on the identified sequence; and extracting a portion of the text corresponding to the topic boundaries as one of the candidate topics; determining, via a second computing process, an Accumulated Term Frequency (ATF) for each candidate topic of the candidate topics, the ATF counting an occurrence of the candidate topic once for each post in which the candidate topic appears; determining, via a third computing process, an inverse document frequency (IDF) for each of the candidate topics in the data sample; determining, via a fourth computing process, a relevance score that quantifies relative importance of the candidate topics using the ATF and the IDF to determine trending topics; and transmitting, via a fifth computing process, the trending topics for display on a user device; wherein each of the computing processes is performed by one or more computing devices. 12. The computer-implemented method of claim 11 , wherein applying natural language processing to extract candidate topics from the data sample comprises using a second rule that prohibits using a cardinal number as a first word of a candidate topic unless the cardinal number starts with a letter. 13. The computer-implemented method of claim 11 , further comprising, classifying, via a sixth computing process, the trending topics into categories. 14. The computer-implemented method of claim 13 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organization, person, and location; and utilizing dictionary sources to classify unknown topics. 15. The computer-implemented method of claim 11 , further comprising grouping, via a seventh computing process, semantically-similar topics. 16. The computer-implemented method of claim 11 , wherein the one or more social media streams are sampled in accordance with a user selection of a data source, a time constraint, desired demographics, a product, a service, a feature, an organization, a person, or a location. 17. The computer-implemented method of claim 11 , further comprising ranking users contributing to the sampled data to identify popular users, and sampling from the popular users in the one or more social media streams to produce the data sample, wherein each user is ran

Assignees

Inventors

Classifications

  • Business processes related to social networking or social networking services · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Recognition of textual entities · CPC title

  • using natural language analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10095686B2 cover?
Real-time topic analysis for social listening is performed to help users and organizations in discovering and understanding trending topics in varying degrees of granularity. A density-based sampling method is employed to reduce data input. A lightweight NLP method is utilized for topic extraction which provides an efficient mechanism for handling dynamically-changing content. In embodiments, t…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).