Detecting poisoning attacks on neural networks by activation clustering

US11188789B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11188789-B2
Application numberUS-201816057706-A
CountryUS
Kind codeB2
Filing dateAug 7, 2018
Priority dateAug 7, 2018
Publication dateNov 30, 2021
Grant dateNov 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a method comprising receiving a training set comprising a plurality of data points, where a neural network is trained as a classifier based on the training set. The method further comprises, for each data point of the training set, classifying the data point with one of a plurality of classification labels using the trained neural network, and recording neuronal activations of a portion of the trained neural network in response to the data point. The method further comprises, for each classification label that a portion of the training set has been classified with, clustering a portion of all recorded neuronal activations that are in response to the portion of the training set, and detecting one or more poisonous data points in the portion of the training set based on the clustering.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving a training set comprising a plurality of data points, wherein a neural network is trained as a classifier based on the training set; for each data point of the training set: classifying the data point with one of a plurality of classification labels using the trained neural network; and recording neuronal activations of a portion of the trained neural network in response to the data point; and for each classification label that a portion of the training set has been classified with: clustering a portion of all recorded neuronal activations that are in response to the portion of the training set; and detecting one or more poisonous data points in the portion of the training set based on the clustering. 2. The method of claim 1 , further comprising: training an initial neural network based on the training set, resulting in the trained neural network. 3. The method of claim 1 , wherein the training set is an untrusted data set. 4. The method of claim 1 , wherein the neural network is a convolutional neural network. 5. The method of claim 4 , wherein the portion of the neural network is a last hidden layer in the neural network. 6. The method of claim 1 , wherein the neural network is a region-based convolutional neural network (R-CNN). 7. The method of claim 6 , wherein the portion of the neural network is a last hidden layer corresponding to a proposed region of interest in the R-CNN. 8. The method of claim 1 , further comprising: segmenting all the recorded neuronal activations into one or more segments in accordance with the plurality of classification labels; and for each segment, clustering neuronal activations included in the segment. 9. The method of claim 8 , wherein clustering neuronal activations included in the segment comprises: applying a clustering method that clusters the neuronal activations included in the segment into two clusters. 10. The method of claim 9 , further comprising: classifying a smallest cluster of the two clusters as poisonous, wherein, for each neuronal activation included in the smallest cluster, a data point in the training set that resulted in the neuronal activation is identified as a poisonous data point. 11. The method of claim 8 , wherein clustering neuronal activations included in the segment comprises: applying a clustering method that clusters the neuronal activations included in the segment into a set of clusters; and determining a total number of clusters included in the set of clusters. 12. The method of claim 11 , further comprising: classifying the training set as legitimate in response to determining the total number of clusters is one. 13. The method of claim 11 , further comprising: in response to determining the total number of clusters is more than one: classifying a largest cluster of the set of clusters as legitimate; and classifying each remaining cluster of the set of clusters as poisonous, wherein, for each neuronal activation included in the remaining cluster, a data point in the training set that resulted in the neuronal activation is identified as a poisonous data point. 14. The method of claim 8 , further comprising: for each cluster generated in response to the clustering: for each neuronal activation included in the cluster, identifying a data point in the training set that resulted in the neuronal activation; generating an average of all data points identified; and providing the average to a user to determine whether all the data points identified are poisonous or legitimate. 15. A system comprising: at least one processor; and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations including: receiving a training set comprising a plurality of data points, wherein a neural network is trained as a classifier based on the training set; for each data point of the training set: classifying the data point with one of a plurality of classification labels using the trained neural network; and recording neuronal activations of a portion of the trained neural network in response to the data point; and for each classification label that a portion of the training set has been classified with: clustering a portion of all recorded neuronal activations that are in response to the portion of the training set; and detecting one or more poisonous data points in the portion of the training set based on the clustering. 16. The system of claim 15 , wherein the operations further comprise: segmenting all the recorded neuronal activations into one or more segments in accordance with the plurality of classification labels; and for each segment, clustering neuronal activations included in the segment. 17. The system of claim 16 , wherein clustering neuronal activations included in the segment comprises: applying a clustering method that clusters the neuronal activations included in the segment into two clusters; and classifying a smallest cluster of the two clusters as poisonous, wherein, for each neuronal activation included in the smallest cluster, a data point in the training set that resulted in the neuronal activation is identified as a poisonous data point. 18. The system of claim 16 , wherein clustering neuronal activations included in the segment comprises: applying a clustering method that clusters the neuronal activations included in the segment into a set of clusters; determining a total number of clusters included in the set of clusters; in response to determining the total number of clusters is one, classifying the training set as legitimate; and in response to determining the total number of clusters is more than one: classifying a largest cluster of the set of clusters as legitimate; and classifying each remaining cluster of the set of clusters as poisonous, wherein, for each neuronal activation included in the remaining cluster, a data point in the training set that resulted in the neuronal activation is identified as a poisonous data point. 19. The system of claim 16 , wherein the operations further comprise: for each cluster generated in response to the clustering: for each neuronal activation included in the cluster, identifying a data point in the training set that resulted in the neuronal activation; generating an average of all data points identified; and providing the average to a user to determine whether all the data points identified are poisonous or legitimate. 20. A computer program product comprising a computer-readable hardware storage medium having program code embodied therewith, the program code being executable by a computer to implement a method comprising: receiving a training set comprising a plurality of data points, wherein a neural network is trained as a classifier based on the training set; for each data point of the training set: classifying the data point with one of a plurality of classification labels using the trained neural network; and recording neuronal activations of a portion of the trained neural network in response to the data point; and for each classification label that a portion of the training set has been classified with: clustering a portion of all recorded neuronal activations that are in response to the portion of the training set; and detecting one or more poisonous data points in the portion of the training set based on the clustering.

Assignees

Inventors

Classifications

  • Character recognition · CPC title

  • Classification techniques · CPC title

  • using neural networks · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11188789B2 cover?
One embodiment provides a method comprising receiving a training set comprising a plurality of data points, where a neural network is trained as a classifier based on the training set. The method further comprises, for each data point of the training set, classifying the data point with one of a plurality of classification labels using the trained neural network, and recording neuronal activati…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).