Iterative Classifier Training on Online Social Networks
US-2016155063-A1 · Jun 2, 2016 · US
US2016292157A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016292157-A1 |
| Application number | US-201514679736-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 6, 2015 |
| Priority date | Apr 6, 2015 |
| Publication date | Oct 6, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Real-time topic analysis for social listening is performed to help users and organizations in discovering and understanding trending topics in varying degrees of granularity. A density-based sampling method is employed to reduce data input. A lightweight NLP method is utilized for topic extraction which provides an efficient mechanism for handling dynamically-changing content. In embodiments, the social analytics system further helps users understand the topics by ranking topics by relevance, labeling topic categories, and grouping semantically-similar topics.
Opening claim text (preview).
What is claimed is: 1 . A non-transitory computer storage medium storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising: retrieving data, via a social engine, from one or more social media streams, the one or more social media streams sampled in accordance with a user selection received via a user device; utilizing natural language processing, at a trending topic tool, to identify candidate topics of the data; ranking the candidate topics, at the trending topic tool, with a relevance score to determine trending topics; classifying, at the trending topic tool, the trending topics into categories; and grouping semantically-similar topics, at the trending topic tool, wherein the semantically-similar topics provide a user, via the user device, with a real-time understanding of social media, in accordance with the user selection. 2 . The non-transitory computer storage medium of claim 1 , wherein the user selection is a time constraint. 3 . The non-transitory computer storage medium of claim 1 , wherein a fixed amount of data is sampled in accordance with the user selection. 4 . The non-transitory computer storage medium of claim 1 , wherein the user selection specifies the data is sampled from all users or popular users. 5 . The non-transitory computer storage medium of claim 1 , further comprising ranking users contributing to the sampled data to identify popular users. 6 . The non-transitory computer storage medium of claim 6 , wherein each user is ranked by multiplying the number of followers for the user by the logarithm of the number of posts for the account of the user. 7 . The non-transitory computer storage medium of claim 1 , wherein the data is retrieved by: splitting the one or more social media streams into bins, the bins being split in accordance with equal time lengths per bin; calculating a number of posts in each bin; determining an expected sample count for each bin by dividing the number of posts for each bin by a total number of posts in all bins, and multiplying by a total expected sample count; and retrieving the data in accordance with the expected sample count for each bin. 8 . The non-transitory computer storage medium of claim 1 , wherein utilizing natural language processing to identify candidate topics comprises: identifying words and phrases as candidates based on a part-of-speech (POS) tag being a proper noun, a plural proper noun, or a cardinal number; and defining topic boundaries that belong to each candidate. 9 . The non-transitory computer storage medium of claim 1 , wherein ranking the candidate topics with a relevance score comprises: determining an Accumulated Term Frequency (ATF) for a candidate topic in a document of the data, the ATF not being a term frequency for the candidate topic in the document; determining an Inverse Document Frequency (IDF) for the candidate topic in the data; and determining the relevance score for the candidate topic. 10 . The non-transitory computer storage medium of claim 1 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organizations, person names, and locations; and utilizing dictionary sources to classify unknown topics. 11 . A computer-implemented method comprising: determining, via a first computing process, an Accumulated Term Frequency (ATF) for each candidate topic identified in a data sample retrieved from one or more social media streams via a social engine; determining, via a second computing process, the inverse document frequency for each candidate topic in the data sample; and determining, via a third computing process, a relevance score for each candidate topic to determine trending topics, wherein the trending topics provide a user, via a user device, with a real-time understanding of social media, in accordance with a user selection received from the user device; wherein each of the computing processes is performed by one or more computing devices. 12 . The computer-implemented method of claim 11 , further comprising, utilizing, via a fourth computing process, natural language processing to identify candidate topics of the data sample. 13 . The computer-implemented method of claim 12 , wherein utilizing natural language processing to identify candidate topics of the data sample comprises: identifying words and phrases as candidates for the candidate topics based on a part-of-speech (POS) tag being a proper noun, a plural proper noun, or a cardinal number; defining topic boundaries that belong to each candidate topic to identify the start and end of each candidate topic; and extracting each candidate topic. 14 . The computer-implemented method of claim 11 , further comprising, classifying, via a fifth computing process, the trending topics into categories. 15 . The computer-implemented method of claim 14 , wherein classifying the trending topics into categories comprises: applying classification rules to the trending topics, the classification rules being manually crafted and relying on internal evidence and external evidence, wherein the classification rules that rely on internal evidence are applied before the classifying rules relying on external evidence; classifying the trending topics in accordance with the rules, the classifications including organization, person, and location; and utilizing dictionary sources to classify unknown topics. 16 . The computer-implemented method of claim 11 , further comprising grouping, via a sixth computing process, semantically-similar topics. 17 . The computer-implemented method of claim 11 , wherein the one or more social media streams are sampled in accordance with a user selection of a data source, a time constraint, desired demographics, a product, a service, a feature, an organization, a person, or a location. 18 . The computer-implemented method of claim 11 , further comprising ranking users contributing to the sampled data to identify popular users, wherein each user is ranked by multiplying the number of followers for the user by the logarithm of the number of posts for the account of the user. 19 . The computer-implemented method of claim 11 , wherein the data is retrieved by: splitting the one or more social media streams into bins, the bins being split in accordance with equal time lengths per bin; calculating a number of posts in each bin; determining an expected sample count for each bin by dividing the number of posts for each bin by a total number of posts in all bins, and multiplying by a total expected sample count; and retrieving the data in accordance with the expected sample count for each bin. 20 . A computerized system comprising: one or more processors; and a non-transitory computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to: retrieve data, via a social engine, from one or more social media streams, the one or more social media streams sampled in accordance with a user selection received via a user
Business processes related to social networking or social networking services · CPC title
using natural language analysis · CPC title
Recognition of textual entities · CPC title
Semantic analysis · CPC title
Services · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.