Detecting Adversarial Attacks through Decoy Training
US-2020005133-A1 · Jan 2, 2020 · US
US11468314B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11468314-B1 |
| Application number | US-201816129553-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 12, 2018 |
| Priority date | Sep 12, 2018 |
| Publication date | Oct 11, 2022 |
| Grant date | Oct 11, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments disclosed herein describe systems, methods, and products that generate trained neural networks that are robust against adversarial attacks. During a training phase, an illustrative computer may iteratively optimize a loss function that may include a penalty for ill-conditioned weight matrices in addition to a penalty for classification errors. Therefore, after the training phase, the trained neural network may include one or more well-conditioned weight matrices. The one or more well-conditioned weight matrices may minimize the effect of perturbations within an adversarial input thereby increasing the accuracy of classification of the adversarial input. By contrast, conventional training approaches may merely reduce the classification errors using backpropagation, and, as a result, any perturbation in an input is prone to generate a large effect on the output.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a non-transitory storage medium storing one or more computer instructions and one or more well-conditioned weight matrices forming a neural network, the neural network generated by being trained to optimize a loss function to generate the one or more well-conditioned weight matrices: a processor coupled to the non-transitory storage medium and configured to execute the computer instructions and access the one or more well-conditioned weight matrices to: receive an adversarial input file containing an input to be classified with calculated perturbations added to a representation of the input; and classify the input in the adversarial input file by propagating portions of the adversarial input file through a plurality of layers of the neural network while constraining the effect of the calculated perturbations utilizing the one or more well-conditioned weight matrices; wherein the loss function comprises a first penalty for classification errors and a second penalty for one or more ill-conditioned weight matrices in the neural network, wherein the second penalty is based on a condition number of weight matrices of the neural network. 2. The system of claim 1 , wherein the adversarial input file is selected from the group consisting of a video file, an image file, a text file, and an audio file. 3. The system of claim 1 , wherein the input to be classified is selected from the group consisting of a handwritten text, a likeness of an object, and a voice. 4. The system of claim 1 , wherein the neural network is a convolutional neural network. 5. The system of claim 1 , wherein the second penalty for the one or more ill-conditioned matrices is scaled by a regularization parameter. 6. The system of claim 5 , wherein a first layer of the neural network may be associated with a first regularization parameter, and a second layer of the neural network may be associated with a second regularization parameter. 7. A computer-implemented method of training a neural network against adversarial attacks, the method comprising: initializing, by a computer, a neural network with random values to one or more weight matrices; and iteratively optimizing, by the computer, a loss function comprising a first penalty for classification errors and a second penalty for ill-conditioned weight matrices in the neural network by minimizing the second penalty for ill-conditioned weight matrices such that the computer generates a trained neural network with one or more well-conditioned weight matrices, whereby the one or more well-conditioned matrices constrain the effect of calculated perturbations added to an input in an adversarial input file; wherein the second penalty for ill-conditioned weight matrices is based on a condition number of weight matrices of the neural network. 8. The method of claim 7 , wherein the neural network is a convolutional neural network. 9. The method of claim 7 , wherein training data for the neural network is selected from the group consisting of image data, video data, text data, and audio data. 10. The method of claim 7 , wherein the second penalty for the one or more ill-conditioned weight matrices is scaled by a regularization parameter. 11. The method of claim 10 , wherein a first layer of the neural network may be associated with a first regularization parameter and a second layer of the neural network may be associated with a second regularization parameter. 12. The method of claim 11 , further comprising: dynamically selecting, by the computer, the first regularization parameter for a weight matrix associated with the first layer based upon a condition number of weight matrix during an iteration. 13. The method of claim 7 , wherein the iteratively optimizing the loss function by the computer comprises: iteratively modifying, by the computer, a weight matrix of the one or more ill-conditioned weight matrices, such that a normalized modified weight matrix is approximately semi-orthogonal. 14. A non-transitory computer-readable medium containing computer program instructions, which when executed by a processor, cause the processor to perform operations comprising: receiving, by the processor, an adversarial input file, containing an input to be classified with calculated perturbations added to a representation of the input; deploying, by the processor, a neural network on the adversarial input file, the neural network containing one or more well-conditioned weight matrices, the neural network generated by being trained to optimize a loss function including a first penalty for classification errors and a second penalty for one or more ill-conditioned weight matrices in the neural network to generate the one or more well-conditioned weight matrices, wherein the second penalty is based on a condition number of weight matrices of the neural network; and classifying, by the processor, based upon deploying the neural network, the input in the adversarial input file by propagating portions of the adversarial input file through a plurality of layers of the neural network while constraining the effect of the calculated perturbations by utilizing the one or more well-conditioned weight matrices. 15. The non-transitory computer-readable medium of claim 14 , wherein the adversarial input file is selected from the group consisting of a video file, an image file, a text file, and an audio file. 16. The non-transitory computer-readable medium of claim 14 , wherein the media input to be classified is selected from the group consisting of a handwritten text, a likeness of an object, and a voice. 17. The non-transitory computer-readable medium of claim 14 , wherein calculated perturbations are added by an attacker in attempt to cause the neural network to misclassify the input. 18. The non-transitory computer-readable medium of claim 14 , wherein the second penalty for the one or more ill-conditioned matrices is scaled by a regularization parameter.
Related publications grouped by family.
Answers are generated from the same data shown on this page.