Social community identification for automatic document classification

US9317594B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9317594-B2
Application numberUS-201213727951-A
CountryUS
Kind codeB2
Filing dateDec 27, 2012
Priority dateDec 27, 2012
Publication dateApr 19, 2016
Grant dateApr 19, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for identifying data files that have a common characteristic are provided. A plurality of data files are received. The plurality of data files include one or more data files having the common characteristic. A list of key terms is generated from the plurality of data files. Data files from the plurality of data files that have an association with a social community are identified, where the social community is defined by one or more features. The list of key terms is updated based on an analysis of the identified features. The updated list of key terms is used to identify other data files that have the common characteristic.

First claim

Opening claim text (preview).

It is claimed: 1. A computer-implemented method for identifying data files that have a common characteristic, the method comprising: receiving a plurality of data files including one or more data files having the common characteristic; generating a list of key terms from the plurality of data files; classifying each data file of the plurality of data files within a hierarchical structure, the hierarchical structure including upper nodes and lower nodes configured to group data files having similar characteristics, wherein a data file is classified within a lower node of the hierarchical structure based on a psychological characteristic of the classified data file, wherein the psychological characteristic indicates a psychological state of the creator of the classified data file; identifying data files from the plurality of data files having an association with a social community, the social community being a homogenous sub-group of a larger population defined by one or more features, wherein the identified data files having the association with the social community are classified within a particular node of the hierarchical structure that is defined by the one or more features; updating the list of key terms based on an analysis of the identified data files; and using the updated list of key terms to identify other data files that have the common characteristic. 2. The method of claim 1 , wherein the upper nodes group the data files based on general similarities, and wherein the lower nodes group the data files based on specific similarities. 3. The method of claim 1 , further comprising: using a decision tree to classify the data files within the hierarchical structure, wherein the decision tree employs a criterion sensitive to a presence or an absence of the key terms in the plurality of data files, and wherein data files grouped within the lower nodes have a greater number of similarities than data files grouped within the upper nodes. 4. The method of claim 1 , further comprising: building a network including one or more of the plurality of data files, wherein connections between the data files of the network are encoded as links or edges, and wherein the data files are classified within the hierarchical structure by analyzing the network. 5. The method of claim 4 , wherein the links or edges are top-down directional links or edges. 6. The method of claim 1 , further comprising: classifying data files within an upper node of the hierarchical structure based on a physical connection between the classified data files or based on a semantic connection between the classified data files, wherein the physical connection indicates a message exchange between the classified data files, and wherein the semantic connection indicates shared semantic content in the classified data files. 7. The method of claim 6 , wherein the data files classified within the upper node are linked together by a thread. 8. The method of claim 7 , wherein the thread is defined by email header fields, a common thread field in a database, a common topic on a discussion forum, or a common social media message. 9. The method of claim 6 , wherein the data files classified within the upper node originate from a common geographical location, are associated with a common period of time, or are associated by a shared semantic similarity based on patterns of nouns, verbs, other words, or parts of speech. 10. The method of claim 1 , wherein the data file is further classified within a lower node of the hierarchical structure based on a social organization characteristic of the classified data file, an individual descriptive characteristic of the classified data file, or an operational characteristic of the classified data file, wherein the social organization characteristic indicates a social position associated with a creator of the classified data file, wherein the individual descriptive characteristic indicates a personal characteristic of the creator of the classified data file, and wherein the operational characteristic indicates characteristics of message exchange associated with the classified data file. 11. The method of claim 10 , further comprising: classifying the data file within the lower node of the hierarchical structure by classifying the data file based on the individual descriptive characteristic first, classifying the data file based on the social organization characteristic second, and classifying the data file based on the operational characteristic third. 12. The method of claim 10 , wherein the individual descriptive characteristic includes age, gender, education, marital status, interests, affiliations, or memberships of the creator of the classified data file. 13. The method of claim 1 , wherein the psychological characteristic includes mood state of the creator of the classified data file or an introversion or extroversion score of the creator of the classified data file. 14. The method of claim 10 , wherein the social organization characteristic includes a geographical location associated with the classified data file; a time associated with the classified data file; a social role associated with the classified data file; an indication of whether the creator of the classified data file has a leader status, a follower status, or a marginal status; a social influence associated with the classified data file; a community size associated with the classified data file; a community density associated with the classified data file; a dispersion of a community associated with the classified data file; or a community character associated with the classified data file. 15. The method of claim 10 , wherein the operational characteristic includes a message recency of the classified data file, a frequency of message exchange over a given time period between the data files classified within the lower node, a message mood state, a conversation acceleration rate of the data files classified within the lower node, or a characterization of the message exchange between the data files classified within the lower node as being personal or professional. 16. The method of claim 10 , further comprising: classifying the data file within the lower node of the hierarchical structure by classifying the data file based on the individual descriptive characteristic first, classifying the data file based on the operational characteristic second, and classifying the data file based on the social organization characteristic third. 17. The method of claim 10 , further comprising: classifying the data file within the lower node of the hierarchical structure by classifying the data file based on the social organization characteristic first, classifying the data file based on the individual descriptive characteristic second, and classifying the data file based on the operational characteristic third. 18. The method of claim 10 , further comprising: classifying the data file within the lower node of the hierarchical structure by classifying the data file based on the social organization characteristic first, classifying the data file based on the operational characteristic second, and classifying the data file based on the individual descriptive characteristic third. 19. The method of claim 10 , further comprising: classifying the data file within the lower node of the hierarchical structure by classifying the data file based on the operational characteristic first, classifying the data file based on the social organization characteristic second, and classifying the

Assignees

Inventors

Classifications

  • G06F16/355Primary

    Creation or modification of classes or clusters · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9317594B2 cover?
Systems and methods for identifying data files that have a common characteristic are provided. A plurality of data files are received. The plurality of data files include one or more data files having the common characteristic. A list of key terms is generated from the plurality of data files. Data files from the plurality of data files that have an association with a social community are ident…
Who is the assignee on this patent?
Sas Inst Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/355. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 19 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).