Detecting poisoning attacks on neural networks by activation clustering
US-11188789-B2 · Nov 30, 2021 · US
US11538236B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11538236-B2 |
| Application number | US-201916571318-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 16, 2019 |
| Priority date | Sep 16, 2019 |
| Publication date | Dec 27, 2022 |
| Grant date | Dec 27, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments relate to a system, program product, and method for processing an untrusted data set to automatically determine which data points there are poisonous. A neural network is trained network using potentially poisoned training data. Each of the training data points is classified using the network to retain the activations of at least one hidden layer, and segment those activations by the label of corresponding training data. Clustering is applied to the retained activations of each segment, and a clustering assessment is conducted to remove an identified cluster from the data set, form a new training set, and train a second neural model with the new training set. The removed cluster and corresponding data are applied to the trained second neural model to analyze and classify data in the removed cluster as either legitimate or poisonous.
Opening claim text (preview).
What is claimed is: 1. A computer system comprising: a processor operatively coupled to memory; and an artificial intelligence (AI) platform, in communication with the processor, having machine learning (ML) tools to process an untrusted training data set, the tools comprising: a training manager configured to train a first neural model with the untrusted data set; a ML manager, operatively coupled to the training manager, configured to classify each data point in the untrusted data set using the trained first neural model, and to retain activations of one or more designated layers in the trained first neural model; a cluster manager, operatively coupled to the ML manager, configured to apply a clustering technique on the retained activations for each label, and for each cluster to assess integrity of data in the cluster, including the cluster manager configured to: remove a cluster identified as containing suspect data from the data set, and form a new training set with data remaining in the data set without the suspect data; train a second neural model using the new training set; and using the trained second neural model, analyze data in the removed cluster and assess alignment of one or more of the classified data points with respect to a label assignment; and a classification manager, operatively coupled to the cluster manager, the classification manager configured to assign a poisonous classification or a legitimate classification to the removed cluster, the cluster classification corresponding to the alignment assessment. 2. The system of claim 1 , wherein the alignment assessment of the one or more classified data points further comprises the cluster manager configured to: compare data classification labels returned from the trained second neural model with one or more original data classification labels, wherein the classification assignment is responsive to the data classification label comparison. 3. The system of claim 2 , wherein the comparison further comprises the cluster manager configured to identify a plurality of the returned labels matching the one or more original data classification labels, and assign the legitimate classification to the removed cluster. 4. The system of claim 2 , wherein the comparison further comprises the cluster manager configured to identify a plurality of the returned labels conflicting with the one or more original data classification labels, and assign the poisonous classification to the removed cluster. 5. The system of claim 2 , wherein the alignment assessment of the one or more classified data points further comprises the cluster manager configured to compare a first value representative of quantity of the data points classified by the trained second neural model with a second value representative of a quantity of the data points classified with a label representing a majority label, and wherein the assignment of the poisonous classification or the legitimate classification to the removed cluster is responsive to the comparison. 6. The system of claim 5 , further comprising the cluster manager configured to assign the legitimate classification to the removed cluster when the comparison indicates that the first value is greater than the second value, and assign the poisonous classification to the removed cluster when the first value is less than the second value. 7. The system of claim 1 , further comprising a repair manager, operatively coupled to the cluster manager, and configured to repair the removed cluster classified as poisonous data. 8. A computer program product to utilize machine learning to process an untrusted training data set, the computer program product comprising: a computer readable storage medium having program code embodied therewith, the program code executable by a processor to: train a first neural model with the untrusted data set; classify each data point in the untrusted data set using the trained first neural model, and retain activations of one or more designated layers in the trained first neural model; apply a clustering technique on the retained activations for each label, and for each cluster assess integrity of data in the cluster, including program code to: remove a cluster identified as containing suspect data from the data set, and form a new training set with data remaining in the data set without the suspect data; train a second neural model using the new training set; and using the trained second neural model, analyze data in the removed cluster and assess alignment of one or more of the classified data points with respect to a label assignment; and assign a poisonous classification or a legitimate classification to the removed cluster, the cluster classification corresponding to the alignment assessment. 9. The computer program product of claim 8 , wherein the alignment assessment of the one or more classified data points further comprises program code executable by the processor to: compare data classification labels returned from the trained second neural model with one or more original data classification labels, wherein the classification assignment is responsive to the data classification label comparison. 10. The computer program product of claim 9 , wherein the comparison further comprises program code executable by the processor to identify a plurality of the returned labels matching the one or more original data classification labels, and assign the legitimate classification to the removed cluster. 11. The computer program product of claim 9 , wherein the comparison further comprises program code executable by the processor to identify a plurality of the returned labels conflicting with the original data classification label, and assign the poisonous classification to the removed cluster. 12. The computer program product of claim 9 , wherein the alignment assessment of the one or more classified data points further comprises program code executable by the processor to compare a first value representative of a quantity of the data points classified by the trained second neural model with a second value representative of a quantity of the data points classified with a label representing a majority label, and wherein the assignment of the poisonous classification or the legitimate classification to the removed cluster is responsive to the comparison. 13. The computer program product of claim 12 , further comprising program code executable by the processor to assign the legitimate classification to the removed cluster when the comparison indicates that the first value is greater than the second value, and assign the poisonous classification to the removed cluster when the first value is less than the second value. 14. The computer program product of claim 8 , further comprising program code executable by the processor to repair the removed cluster assigned the poisonous classification. 15. A method comprising: receiving, by a neural network, an untrusted training data set, each data point of the untrusted data set having a label; training a first neural model using the untrusted data set; classifying each data point in the untrusted data set using the trained first neural model, and retaining activations of one or more designated layers in the trained first neural model; applying a clustering technique on the retained activations for each label; for each cluster, assessing integrity of data in the cluster including: removing a cluster identified as containing suspect data from the data set, and forming a new training set with data remaining in the data set without the suspect data; training a s
Incorporation of unlabelled data, e.g. multiple instance learning [MIL] · CPC title
Distances to prototypes · CPC title
with adaptive number of clusters · CPC title
using classification, e.g. of video objects · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.