Model-based classification of content items

US2017076225A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017076225-A1
Application numberUS-201514856275-A
CountryUS
Kind codeA1
Filing dateSep 16, 2015
Priority dateSep 16, 2015
Publication dateMar 16, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments provide a system for processing data. During operation, the system obtains validated training data containing a first set of content items and a first set of classification tags for the first set of content items. Next, the system uses the validated training data to produce a statistical model for classifying content using a set of dimensions represented by the first set of classification tags. The system then uses the statistical model to generate a second set of classification tags for a second set of content items. Finally, the system outputs one or more groupings of the second set of content items by the second set of classification tags to improve understanding of content related to the set of dimensions without requiring a user to manually analyze the second set of content items.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: obtaining validated training data comprising a first set of content items and a first set of classification tags for the first set of content items; using the validated training data to produce, by one or more computer systems, a statistical model for classifying content using a set of dimensions represented by the first set of classification tags; using the statistical model to generate, by the one or more computer systems, a second set of classification tags for a second set of content items; and outputting, by the one or more computer systems, one or more groupings of the second set of content items by the second set of classification tags to improve understanding of content related to the set of dimensions without requiring a user to manually analyze the second set of content items. 2 . The method of claim 1 , further comprising: obtaining a validated subset of the second set of classification tags for the second set of content items. 3 . The method of claim 2 , further comprising: providing the validated subset as additional training data to the statistical model to produce an update to the statistical model; and using the update to generate a third set of classification tags for a third set of content items. 4 . The method of claim 2 , wherein obtaining the validated subset of the second set of classification tags comprises: displaying the second set of content items and the second set of classification tags in a user interface; and obtaining one or more corrections to the second set of classification tags through the user interface. 5 . The method of claim 1 , wherein using the training data to produce the statistical model for classifying the relevance of content to the one or more topics comprises: generating a set of features from a content item in the first set of content items; and providing the set of features as input to the statistical model. 6 . The method of claim 5 , wherein the set of features comprises at least one of: one or more n-grams from the content items; a number of characters; a number of units of speech; an average number of units of speech; and a percentage of a character type. 7 . The method of claim 5 , wherein the set of features comprises profile data for a creator of the content item. 8 . The method of claim 1 , wherein the set of dimensions comprises a sentiment. 9 . The method of claim 1 , wherein the set of dimensions comprises a product associated with an online professional network. 10 . The method of claim 1 , wherein the set of dimensions comprises a value proposition. 11 . An apparatus, comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: obtain validated training data comprising a first set of content items and a first set of classification tags for the first set of content items; use the validated training data to produce a statistical model for classifying content using a set of dimensions represented by the first set of classification tags; use the statistical model to generate a second set of classification tags for a second set of content items; and output one or more groupings of the second set of content items by the second set of classification tags to improve understanding of content related to the set of dimensions without requiring a user to manually analyze the second set of content items. 12 . The apparatus of claim 11 , wherein the memory further stores instructions that, when executed by the one or more processors, cause the apparatus to: obtain a validated subset of the second set of classification tags for the first set of content items; provide the validated subset as additional training data to the statistical model to produce an update to the statistical model; and use the update to generate a third set of classification tags for a third set of content items. 13 . The apparatus of claim 12 , wherein obtaining the validated subset of the first set of classification tags comprises: displaying the second set of content items and the second set of classification tags in a user interface; and obtaining one or more corrections to the second set of classification tags through the user interface. 14 . The apparatus of claim 11 , wherein using the training data to produce the statistical model for classifying the relevance of content to the one or more topics comprises: generating a set of features from a content item in the first set of content items; and providing the set of features as input to the statistical model. 15 . The apparatus of claim 14 , wherein the set of features comprises at least one of: one or more n-grams from the content items; a number of characters; a number of units of speech; an average number of units of speech; a percentage of a character type; and profile data for a creator of the content item. 16 . The apparatus of claim 11 , wherein the set of dimensions comprises a sentiment. 17 . The apparatus of claim 11 , wherein the set of dimensions comprises a product associated with an online professional network. 18 . The apparatus of claim 11 , wherein the set of dimensions comprises a value proposition. 19 . A system, comprising: an analysis non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the system to: obtain validated training data comprising a first set of content items and a first set of classification tags for the first set of content items; use the validated training data to produce a statistical model for classifying content using a set of dimensions represented by the first set of classification tags; and use the statistical model to generate a second set of classification tags for a second set of content items; and a management non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the system to output one or more groupings of the second set of content items by the second set of classification tags to improve understanding of content related to the set of dimensions without requiring a user to manually analyze the second set of content items. 20 . The system of claim 19 , wherein the analysis non-transitory computer-readable medium further instructions that, when executed by the one or more processors, cause the system to: obtain a validated subset of the second set of classification tags for the first set of content items; provide the validated subset as additional training data to the statistical model to produce an update to the statistical model; and use the update to generate a third set of classification tags for a third set of content items.

Assignees

Inventors

Classifications

  • Enterprise or organisation modelling · CPC title

  • Employment or hiring · CPC title

  • G06N99/005Primary

    Physics · mapped topic

  • Computing arrangements based on specific mathematical models · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017076225A1 cover?
The disclosed embodiments provide a system for processing data. During operation, the system obtains validated training data containing a first set of content items and a first set of classification tags for the first set of content items. Next, the system uses the validated training data to produce a statistical model for classifying content using a set of dimensions represented by the first s…
Who is the assignee on this patent?
Linkedln Corp, Linkedin Corp
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 16 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).