Privacy against inference attacks for large data
US-2015379275-A1 · Dec 31, 2015 · US
US9946783B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9946783-B1 |
| Application number | US-201615153097-A |
| Country | US |
| Kind code | B1 |
| Filing date | May 12, 2016 |
| Priority date | Dec 27, 2011 |
| Publication date | Apr 17, 2018 |
| Grant date | Apr 17, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for classifying documents is provided. A set of document classifiers is generated by applying a classification algorithm to a trusted corpus that includes a set of training documents representing a taxonomy. One or more of the generated document classifiers are executed against a plurality of input documents to create a plurality of classified documents. Each classified document is associated with a classification within the taxonomy and a classification confidence level. One or more classified documents that are associated with a classification confidence level below a predetermined threshold value are selected to create a set of low-confidence documents. The low-confidence documents are disassociated from each of the associated classifications. A user is prompted to enter a classification within the taxonomy for at least one low-confidence document. The low-confidence document is associated with the entered classification and with a predetermined confidence level to create a newly classified document.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: identifying, within a set of documents that have been classified within a hierarchical taxonomy using a classification algorithm, documents having a classification confidence level that is below a predetermined confidence level threshold; disassociating the identified documents from their respective classifications based on the classification level being below the predetermined confidence level threshold; obtaining, from a different classifier, a new classification within the hierarchical taxonomy for each of the identified documents; associating each of the newly classified documents with a highest classification confidence level for its respective new classification; including the newly classified documents in a trusted corpus of documents that are used to train the classification algorithm; determining a distribution of classifications of the newly classified documents within the trusted corpus of documents; updating the classification algorithm based on the trusted corpus of documents, such that the classification algorithm is configured to classify documents to promote a classification distribution that is in accordance with the determined distribution of classifications; and applying the updated classification algorithm to at least a portion of the set of documents to obtain new classifications within the taxonomy or new classification confidence levels for the portion of the set of documents, such that the at least a portion of the set of documents are classified in accordance with the classification distribution. 2. The method of claim 1 , wherein the hierarchical taxonomy includes a plurality of levels, wherein each level includes one or more nodes that represent a classification. 3. The method of claim 1 , wherein a classification confidence level for a given document is indicative of an accuracy of an assignment of a classification of the given document and is based on a measure of a degree to which data included in the given document matches attributes of the classification. 4. The method of claim 1 , wherein updating the classification algorithm includes applying a supervised learning model that analyzes the trusted corpus to identify one or more attributes that are associated with classifications of documents in the trusted corpus. 5. The method of claim 1 , wherein the classification algorithm includes a plurality of classifiers, the method further comprising assigning, by each of the classifiers, a different classification to documents that are recognized by the classifier as having attributes that match the classification. 6. The method of claim 5 , further comprising updating the classification algorithm to include at least one new classifier, the new classifier corresponding to a new classification of at least one of the newly classified documents. 7. The method of claim 1 , wherein the at least a portion of the set of documents are classified such that a proportion of documents within the at least a portion of the set of documents that are associated with a given classification is approximate to a proportion of documents within the trusted corpus of documents that have been associated with the given classification. 8. A computer system comprising: one or more memory elements for storing a set of documents that have been classified within a hierarchical taxonomy using a classification algorithm; and one or more processors coupled to the one or more memory elements and including instructions that, when executed, cause the one or more processors to perform operations comprising: identifying, within the set of documents that have been classified within a hierarchical taxonomy using a classification algorithm, documents having a classification confidence level that is below a predetermined confidence level threshold; disassociating the identified documents from their respective classifications based on the classification level being below the predetermined confidence level threshold; obtaining, from a different classifier, a new classification within the hierarchical taxonomy for each of the identified documents; associating each of the newly classified documents with a highest classification confidence level for its respective new classification; including the newly classified documents in a trusted corpus of documents that are used to train the classification algorithm; determining a distribution of classifications of the newly classified documents within the trusted corpus of documents; updating the classification algorithm based on the trusted corpus of documents, such that the classification algorithm is configured to classify documents to promote a classification distribution that is in accordance with the determined distribution of classifications; and applying the updated classification algorithm to at least a portion of the set of documents to obtain new classifications within the taxonomy or new classification confidence levels for the portion of the set of documents, such that the at least a portion of the set of documents are classified in accordance with the classification distribution. 9. The system of claim 8 , wherein the hierarchical taxonomy includes a plurality of levels, wherein each level includes one or more nodes that represent a classification. 10. The system of claim 8 , wherein a classification confidence level for a given document is indicative of an accuracy of an assignment of a classification of the given document and is based on a measure of a degree to which data included in the given document matches attributes of the classification. 11. The system of claim 8 , wherein updating the classification algorithm includes applying a supervised learning model that analyzes the trusted corpus to identify one or more attributes that are associated with classifications of documents in the trusted corpus. 12. The system of claim 8 , wherein the classification algorithm includes a plurality of classifiers, the operations further comprising assigning, by each of the classifiers, a different classification to documents that are recognized by the classifier as having attributes that match the classification. 13. The system of claim 12 , the operations further comprising updating the classification algorithm to include at least one new classifier, the new classifier corresponding to a new classification of at least one of the newly classified documents. 14. The system of claim 8 , wherein the at least a portion of the set of documents are classified such that a proportion of documents within the at least a portion of the set of documents that are associated with a given classification is approximate to a proportion of documents within the trusted corpus of documents that have been associated with the given classification. 15. One or more non-transitory computer-readable media encoded with instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: identifying, within a set of documents that have been classified within a hierarchical taxonomy using a classification algorithm, documents having a classification confidence level that is below a predetermined confidence level threshold; disassociating the identified documents from their respective classifications based on the classification level being below the predetermined confidence level threshold; obtaining, from a different classifier, a new classification within the hierarchical taxonomy for each of the identified documents; associating each of the newly classified documents with a highest class
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.