Testing adversarial robustness of systems with limited access

US11836256B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11836256-B2
Application numberUS-201916256107-A
CountryUS
Kind codeB2
Filing dateJan 24, 2019
Priority dateJan 24, 2019
Publication dateDec 5, 2023
Grant dateDec 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An adversarial robustness testing method, system, and computer program product include testing a robustness of a black-box system under different access settings via an accelerator.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented adversarial robustness testing method for checking a learning performance of a black-box system, the method comprising: testing a robustness, against an adversarial attack, of a black-box system under different access settings via an accelerator by generating adversarial inputs of a model in a limited access setting of the different access settings, wherein the accelerator includes a function to reduce an attack space in the adversarial attack for query efficiency, wherein a robustness objective for the testing of the robustness of the black-box system uses system defined threat models for adversarial examples, and wherein a perturbed noise at each pixel of the perturbed example is imperceptible up to a predefined ε-tolerant threshold and a non-negative regularization parameter places emphasis on a distortion between the adversarial examples and a legitimate image. 2. The method of claim 1 , wherein the different access settings comprise: a soft-label setting; and a hard-label setting. 3. The method of claim 1 , further comprising: for a soft-label setting as one of the different access settings, using the accelerator and a gradient descent technique to find the adversarial examples and summarize a robustness statistic; and for a hard-label setting as one of the different access settings, using a smoothing function to summarize a robustness statistic. 4. The method of claim 1 , further comprising, given a legitimate input of a plurality of legitimate inputs having a correct class label, determining an optimal adversarial perturbation using the accelerator such that the perturbed example is misclassified to a target class including an incorrect class label by a deep neural network (DNN) model trained on the legitimate inputs. 5. The method of claim 1 , wherein the accelerator comprises a function including an efficient gradient estimation via a random directional estimate and averaging. 6. The method of claim 1 , wherein the accelerator comprises a function including a dimension reduction of an input. 7. The method of claim 1 , wherein the accelerator comprises a function including a problem splitting between a black-box loss function and a white-box adversarial distortion function. 8. The method of claim 1 , embodied in a cloud-computing environment. 9. A computer program product for adversarial robustness testing for checking a learning performance of a black-box system, the computer program product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform: testing a robustness, against an adversarial attack, of a black-box system under different access settings via an accelerator by generating adversarial inputs of a model in a limited access setting of the different access settings, wherein the accelerator includes a function to reduce an attack space in the adversarial attack for query efficiency, wherein a robustness objective for the testing of the robustness of the black-box system uses system defined threat models for adversarial examples, and wherein a perturbed noise at each pixel of the perturbed example is imperceptible up to a predefined ε-tolerant threshold and a non-negative regularization parameter places emphasis on a distortion between the adversarial examples and the legitimate image. 10. The computer program product of claim 9 , wherein the different access settings comprise: a soft-label setting; and a hard-label setting. 11. The computer program product of claim 9 , further comprising: for a soft-label setting as one of the different access settings, using the accelerator and a gradient descent technique to find adversarial example and summarize a robustness statistic; and for a hard-label setting as one of the different access settings, using a smoothing function to summarize a robustness statistic. 12. The computer program product of claim 9 , further comprising, given a legitimate input of a plurality of legitimate inputs having a correct class label, determining an optimal adversarial perturbation using the accelerator such that the perturbed example is misclassified to a target class including an incorrect class label by a deep neural network (DNN) model trained on the legitimate inputs. 13. The computer program product of claim 9 , wherein the accelerator comprises a function including an efficient gradient estimation via a random directional estimate and averaging. 14. The computer program product of claim 9 , wherein the accelerator comprises a function including a dimension reduction of an input. 15. The computer program product of claim 9 , wherein the accelerator comprises a function including a problem splitting between a black-box loss function and a white-box adversarial distortion function. 16. An adversarial robustness testing system for checking a learning performance of a black-box system, the system comprising: a processor; and a memory, the memory storing instructions to cause the processor to perform: testing a robustness, against an adversarial attack, of a black-box system under different access settings via an accelerator by generating adversarial inputs of a model in a limited access setting of the different access settings, wherein the accelerator includes a function to reduce an attack space in the adversarial attack for query efficiency, wherein a robustness objective for the testing of the robustness of the black-box system uses system defined threat models for adversarial examples, and wherein a perturbed noise at each pixel of the perturbed example is imperceptible up to a predefined ε-tolerant threshold and a non-negative regularization parameter places emphasis on a distortion between the adversarial examples and the legitimate image. 17. The system of claim 16 , further comprising: for a soft-label setting as one of the different access settings, using the accelerator and a gradient descent technique to find the adversarial examples and summarize a robustness statistic; and for a hard-label setting as one of the different access settings, using a smoothing function to summarize a robustness statistic. 18. A computer-implemented adversarial robustness testing method for checking a learning performance of a black-box system, the method comprising: testing a robustness, against an adversarial attack, of the black-box system under a limited access setting to the black-box system: receiving a first classification of an input as an output from the black-box system; and determining a minimal change to the input such that a second classification is received as the output from the black-box system, wherein the testing includes a function to reduce an attack space in the adversarial attack for query efficiency, wherein a robustness objective for the testing of the robustness of the black-box system uses system defined threat models for adversarial examples, and wherein a perturbed noise at each pixel of the perturbed example is imperceptible up to a predefined ε-tolerant threshold and a non-negative regularization parameter places emphasis on a distortion between the adversarial examples and the legitimate image. 19. A computer-implemented adversarial robustness testing method for checking a learning performance of a black-box system, the method comprising: testing a robustness, against an adversarial attack, of the black-box system under a limited access setting to the black-box system: f

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • G06F21/577Primary

    Assessing vulnerabilities and evaluating computer system security · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11836256B2 cover?
An adversarial robustness testing method, system, and computer program product include testing a robustness of a black-box system under different access settings via an accelerator.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F21/577. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).