Method and machine learning system for detecting adversarial examples

US2021089957A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021089957-A1
Application numberUS-201916576830-A
CountryUS
Kind codeA1
Filing dateSep 20, 2019
Priority dateSep 20, 2019
Publication dateMar 25, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and machine learning system for detecting adversarial examples is provided. A first machine learning model is trained with a first machine learning training data set having only training data samples with robust features. A second machine learning model is trained with a second machine learning training data set, the second machine learning training data set having only training data samples with non-robust features. A feature is a distinguishing element in a data sample. A robust feature is more resistant to adversarial perturbations than a non-robust feature. A data sample is provided to each of the first and second trained machine learning models during an inference operation. if the first trained machine learning model classifies the data sample with high confidence, and the second trained machine learning model classifies the data sample differently with a high confidence, then the data sample is determined to be an adversarial example.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for detecting adversarial examples, the method comprising: training a first machine learning model with a first machine learning training data set having only training data samples with robust features, to produce a first trained machine learning model; training a second machine learning model with a second machine learning training data set, the second machine learning training data set having only training data samples with non-robust features to produce a second trained machine learning model, wherein a feature is a distinguishing element in a data sample, and wherein a robust feature is more resistant to adversarial perturbations than a non-robust feature; and providing a data sample to each of the first and second trained machine learning models during an inference operation, if the first trained machine learning model classifies the data sample with high confidence, and the second trained machine learning model classifies the data sample differently with a high confidence, then the data sample is determined to be an adversarial example. 2 . The method of claim 1 , wherein the first and second machine learning models include the same machine learning algorithm. 3 . The method of claim 1 , wherein the first and second machine learning models are based on a neural network. 4 . The method of claim 1 , wherein if the first and second trained machine learning models classify the data sample the same, the data sample is determined to not be an adversarial example. 5 . The method of claim 1 , further comprising training a third machine learning model with a third training data set, the third training data set not having any protections against adversarial examples. 6 . The method of claim 5 , further comprising providing the data sample to the third trained machine learning model if the data sample is determined not to be an adversarial example. 7 . The method of claim 1 , wherein the data sample is an image having a non-robust feature, the non-robust feature being imperceptible by a human being. 8 . A method for detecting adversarial examples, the method comprising: compiling a set of robust features and a set of non-robust features, wherein a feature is a distinguishing element in a data sample, and wherein a robust feature is more resistant to adversarial perturbations than a non-robust feature; creating a first machine learning training data set having only training data samples with the robust features; creating a second machine learning training data set having only training data samples with the non-robust features; training a first machine learning model with the first machine learning training data set to produce a first trained machine learning model; training a second machine learning model with the second machine learning training data set to produce a second trained machine learning model; and providing a data sample to each of the first and second trained machine learning models during an inference operation, if the first trained machine learning model classifies the data sample with high confidence, and the second trained machine learning model classifies the data sample differently with high confidence, the data sample is determined to be an adversarial example. 9 . The method of claim 8 , wherein if the first trained machine learning model and the second trained machine learning model classify the data sample the same, the data sample is determined to not be an adversarial example. 10 . The method of claim 9 , wherein the first and second trained machine learning models both include the same machine learning algorithm. 11 . The method of claim 10 , further comprising providing the data sample that is determined to not be an adversarial example to a third trained machine learning model that has been trained without any protections against adversarial examples. 12 . The method of claim 8 , wherein the first, second, and third machine learning models all include a machine learning algorithm for classifying images. 13 . The method of claim 8 , further comprising providing an indication of an attack in response to the adversarial example being detected. 14 . The method of claim 8 , wherein the first, second, and third machine learning models all include a neural network. 15 . A machine learning system comprising: a first trained machine learning model trained with a first training data set including only a plurality of robust features, the first trained machine learning model having an input for receiving an input data sample, and an output for providing a first output classification in response to receiving the input data sample; a second trained machine learning model trained with a second training data set, the second training data set including only a plurality of non-robust features, the second trained machine learning model having an output for providing a second output classification in response to receiving the input data sample, wherein a feature is characterized as being a distinguishing element of a data sample, and wherein a robust feature is more resistant to adversarial perturbations than a non-robust feature; and a distinguisher coupled to an output of both the first and second trained machine learning models for receiving the first and second output classifications, if the first trained machine learning model classifies the data sample with high confidence, and the second trained machine learning model classifies the data sample differently than the first trained machine learning model and with high confidence, the data sample is determined to be an adversarial example. 16 . The machine learning system of claim 15 , wherein if the first and second trained machine learning models classify the data sample the same, the data sample is determined to not be an adversarial example. 17 . The machine learning system of claim 15 , further comprising a third trained machine learning model trained with a third training data set, wherein the third training data set not trained to have any protections against adversarial examples. 18 . The machine learning model of claim 17 , wherein if the first and second trained machine learning models classify the data sample the same, the data sample is determined to not be an adversarial example and the data sample is provided to the third trained machine learning model for classification. 19 . The machine learning model of claim 15 , wherein the first and second trained machine learning models both use the same machine learning algorithm. 20 . The machine learning model of claim 15 , wherein the first and second trained machine learning models include a neural network.

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Classification techniques · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021089957A1 cover?
A method and machine learning system for detecting adversarial examples is provided. A first machine learning model is trained with a first machine learning training data set having only training data samples with robust features. A second machine learning model is trained with a second machine learning training data set, the second machine learning training data set having only training data s…
Who is the assignee on this patent?
Nxp Bv
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).