System and method for counteracting adversarial attacks

US11836249B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11836249-B2
Application numberUS-201916691307-A
CountryUS
Kind codeB2
Filing dateNov 21, 2019
Priority dateNov 21, 2019
Publication dateDec 5, 2023
Grant dateDec 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the present disclosure involve systems, methods, devices, and the like for generating an adversarially resistant model. In one embodiment, a novel architecture is presented that enables the identification of an image that has been adversarially attacked. The system and method used in the identification introduce the use of a denoising module used to reconstruct the original image from the modified image received. Then, further to the reconstruction, an adversarially trained model is used to make a prediction using at least a determination of a loss that may exist between the original image and the denoised image.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a non-transitory memory storing instructions; and a processor configured to execute the instructions to cause the system to perform operations comprising: receiving, via a wireless network communication, a request for an adversarial attack detection, the request including a modified image; identifying, by a machine learning model from within the modified image, whether there is a perturbation from the modified image, wherein the identifying comprises: generating, by an autoencoder, a denoised image, processing, by the machine learning model, the denoised image to determine a reconstruction loss associated with the request for the adversarial attack detection, and determining, by the machine learning model, whether a measurement of the reconstruction loss meets threshold criteria indicating an adversarial attack from the perturbation in the modified image; determining a prediction of an original image of the modified image, the prediction determined based in part on the reconstruction loss and the measurement of the reconstruction loss determined by the machine learning model, and the prediction including a predicted action based in part on the modified image and the original image; and executing the predicted action for the original image. 2. The system of claim 1 , wherein the operations further comprise: feeding back the reconstruction loss determined by the machine learning model to the autoencoder. 3. The system of claim 1 , wherein the modified image includes the original image having the perturbation added to the original image that causes the adversarial attack when a machine processes the modified image, and wherein the adversarial attack causes the machine to produce an erroneous action. 4. The system of claim 1 , wherein the machine learning model is an adversarial trained deep learning model. 5. The system of claim 4 , wherein the adversarial trained deep learning model is trained using adversarially attacked images. 6. The system of claim 1 , wherein a determination that the measurement of reconstruction loss does not meet the threshold criteria indicates the modified image is the original image. 7. The system of claim 1 , wherein generating the denoised image includes reconstructing the original image using an encoder and a decoder. 8. A method comprising: receiving a request to determine an action on a received image; determining the received image is adversarially attacked, the determining including: determining by a denoiser processing the received image, a perturbation in the received image that is adversarially attacked, wherein the determining the perturbation comprises determining, by a machine learning model trained using adversarially attacked images, loss level information associated with the received image from the perturbation in the received image, wherein the loss level information indicates that the received image is adversarially attacked using the perturbation based on a loss threshold processing, by the machine learning model, the received image and the loss level information associated with the perturbation from the received image; making a prediction of an original image from the received image, the prediction determined based on the processing, and the prediction including a predicted action based in part on the received image and the original image; and executing the predicted action for the original image. 9. The method of claim 8 , wherein the determining that the received image received is adversarially attacked indicates the original image is modified by noise associated with the perturbation. 10. The method of claim 8 , wherein the prediction includes a result with a greater confidence than a previous confidence identifying whether the received image is adversarially attacked. 11. The method of claim 8 , further comprising determining, by the denoiser, a code of the received image. 12. The method of claim 11 , wherein the determining the code includes determining a node reduction of a network by an encoder. 13. The method of claim 11 , wherein the denoiser includes a decoder for reconstructing the original image from the code determined. 14. The method of claim 8 , wherein the prediction is based in part on a loss detected. 15. A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: receiving, via a wireless network communication, a request for an adversarial attack detection, the request including a modified image; identifying, by a machine learning model from within the modified image, whether there is a perturbation from the modified image, wherein the identifying comprises: generating, by an autoencoder, a denoised image, processing by the machine learning model, the denoised image to determine a loss associated with the request for the adversarial attack detection, and determining, by the machine learning model, whether a loss level of a reconstruction loss meets a threshold criteria indicating an adversarial attack from the perturbation in the modified image; determining a prediction based in part on the reconstruction loss and the loss level determined by the machine learning model, wherein the determining the prediction comprises: executing a feedback loop that removes the perturbation from the modified image using an iterative processing of the modified image and an original image, predicting, using the machine learning model and the executed feedback loop, the original image of the modified image, and determining, based on the original image, a predicted action for the original image; and executing the predicted action for the original image. 16. The non-transitory machine-readable medium of claim 15 , wherein the operations further comprise: feeding back the reconstruction loss determined by the machine learning model to the autoencoder. 17. The non-transitory machine-readable medium of claim 15 , wherein the modified image includes the original image with the perturbation in a portion of the original image, and wherein the perturbation comprises a machine-readable code that is added to the portion of the original image so that the perturbation is not visible to a human eye. 18. The non-transitory machine-readable medium of claim 15 , wherein the machine learning model is an adversarial trained deep learning model. 19. The non-transitory machine-readable medium of claim 18 , wherein the adversarial trained deep learning model is trained using adversarially attacked images. 20. The non-transitory machine-readable medium of claim 15 , wherein a determination that the loss level does not meet the threshold criteria indicates the modified image is the original image.

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Adversarial learning · CPC title

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11836249B2 cover?
Aspects of the present disclosure involve systems, methods, devices, and the like for generating an adversarially resistant model. In one embodiment, a novel architecture is presented that enables the identification of an image that has been adversarially attacked. The system and method used in the identification introduce the use of a denoising module used to reconstruct the original image fro…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/554. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).