Iterative Classifier Training on Online Social Networks
US-2016155063-A1 · Jun 2, 2016 · US
US10095686B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10095686-B2 |
| Application number | US-201514679736-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 6, 2015 |
| Priority date | Apr 6, 2015 |
| Publication date | Oct 9, 2018 |
| Grant date | Oct 9, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Real-time topic analysis for social listening is performed to help users and organizations in discovering and understanding trending topics in varying degrees of granularity. A density-based sampling method is employed to reduce data input. A lightweight NLP method is utilized for topic extraction which provides an efficient mechanism for handling dynamically-changing content. In embodiments, the social analytics system further helps users understand the topics by ranking topics by relevance, labeling topic categories, and grouping semantically-similar topics.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer storage medium storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising: sampling data, via a social engine, from one or more social media streams, in accordance with a user selection received via a user device; assigning part-of-speech (POS) tags to text in the data; applying natural language processing, by a trending topic tool, to extract candidate topics from the data using a first rule comprising: identifying a sequence of a plurality of the assigned POS tags, wherein each POS tag of the sequence is selected from a group consisting of at least one of a proper noun tag, a plural proper noun tag, or a cardinal number tag; defining topic boundaries based on the identified sequence; and extracting a portion of the text corresponding to the topic boundaries as one of the candidate topics; ranking the candidate topics, by the trending topic tool, with a relevance score that quantifies relative importance of each candidate topic to determine trending topics; classifying, by the trending topic tool, the trending topics into categories; grouping the candidate topics into topic clusters of semantically-similar topics, by the trending topic tool, and transmitting the classified and clustered trending topics for display on the user device. 2. The non-transitory computer storage medium of claim 1 , wherein the user selection is a time constraint. 3. The non-transitory computer storage medium of claim 1 , wherein a fixed amount of data is sampled in accordance with the user selection. 4. The non-transitory computer storage medium of claim 1 , wherein the user selection specifies the data is sampled from all users or popular users. 5. The non-transitory computer storage medium of claim 1 , further comprising ranking users contributing to the sampled data to identify popular users, and sampling from the popular users in the one or more social media streams to produce the data. 6. The non-transitory computer storage medium of claim 5 , wherein each user is ranked by multiplying the number of followers for the user by the logarithm of the number of posts for the account of the user. 7. The non-transitory computer storage medium of claim 1 , wherein the data is sampled from the one or more social media streams by: splitting the one or more social media streams into bins, the bins being split in accordance with equal time lengths per bin; calculating, for a given bin, a ratio of posts in the bin to posts in all the bins; determining, for the given bin, an expected sample count by multiplying a defined total number of samples by the ratio for the bin; and sampling, from the given bin, to generate a number of samples corresponding to the expected sample count for the bin. 8. The non-transitory computer storage medium of claim 1 , wherein applying natural language processing to extract candidate topics comprises using a second rule that prohibits using a cardinal number as a first word of a candidate topic unless the cardinal number starts with a letter. 9. The non-transitory computer storage medium of claim 1 , wherein ranking the candidate topics with a relevance score comprises: determining an Accumulated Term Frequency (ATF) for a candidate topic in a document of the data, the ATF counting an occurrence of the candidate topic once for each document in which the candidate topic appears; determining an Inverse Document Frequency (IDF) for the candidate topic in the data; and determining the relevance score for the candidate topic based on the ATF and the IDF for the candidate topic. 10. The non-transitory computer storage medium of claim 1 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organizations, person names, and locations; and utilizing dictionary sources to classify unknown topics. 11. A computer-implemented method comprising: applying, via a first computing process, natural language processing to extract candidate topics from a data sample comprising a plurality of posts retrieved from one or more social media streams via a social engine, the data sample including text with assigned part-of-speech (POS) tags, wherein the first computing process utilizes a first rule comprising: identifying a sequence of a plurality of the assigned POS tags, wherein each POS tag of the sequence is selected from a group consisting of at least one of a proper noun tag, a plural proper noun tag, or a cardinal number tag; defining topic boundaries based on the identified sequence; and extracting a portion of the text corresponding to the topic boundaries as one of the candidate topics; determining, via a second computing process, an Accumulated Term Frequency (ATF) for each candidate topic of the candidate topics, the ATF counting an occurrence of the candidate topic once for each post in which the candidate topic appears; determining, via a third computing process, an inverse document frequency (IDF) for each of the candidate topics in the data sample; determining, via a fourth computing process, a relevance score that quantifies relative importance of the candidate topics using the ATF and the IDF to determine trending topics; and transmitting, via a fifth computing process, the trending topics for display on a user device; wherein each of the computing processes is performed by one or more computing devices. 12. The computer-implemented method of claim 11 , wherein applying natural language processing to extract candidate topics from the data sample comprises using a second rule that prohibits using a cardinal number as a first word of a candidate topic unless the cardinal number starts with a letter. 13. The computer-implemented method of claim 11 , further comprising, classifying, via a sixth computing process, the trending topics into categories. 14. The computer-implemented method of claim 13 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organization, person, and location; and utilizing dictionary sources to classify unknown topics. 15. The computer-implemented method of claim 11 , further comprising grouping, via a seventh computing process, semantically-similar topics. 16. The computer-implemented method of claim 11 , wherein the one or more social media streams are sampled in accordance with a user selection of a data source, a time constraint, desired demographics, a product, a service, a feature, an organization, a person, or a location. 17. The computer-implemented method of claim 11 , further comprising ranking users contributing to the sampled data to identify popular users, and sampling from the popular users in the one or more social media streams to produce the data sample, wherein each user is ran
Business processes related to social networking or social networking services · CPC title
Search customisation based on user profiles and personalisation · CPC title
Semantic analysis · CPC title
Recognition of textual entities · CPC title
using natural language analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.