Systems and methods for improved adversarial training of machine-learned models

US11494667B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11494667-B2
Application numberUS-201815874121-A
CountryUS
Kind codeB2
Filing dateJan 18, 2018
Priority dateJan 18, 2018
Publication dateNov 8, 2022
Grant dateNov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Example aspects of the present disclosure are directed to systems and methods that enable improved adversarial training of machine-learned models. An adversarial training system can generate improved adversarial training examples by optimizing or otherwise tuning one or hyperparameters that guide the process of generating of the adversarial examples. The adversarial training system can determine, solicit, or otherwise obtain a realism score for an adversarial example generated by the system. The realism score can indicate whether the adversarial example appears realistic. The adversarial training system can adjust or otherwise tune the hyperparameters to produce improved adversarial examples (e.g., adversarial examples that are still high-quality and effective while also appearing more realistic). Through creation and use of such improved adversarial examples, a machine-learned model can be trained to be more robust against (e.g., less susceptible to) various adversarial techniques, thereby improving model, device, network, and user security and privacy.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, the method comprising: obtaining, by one or more computing devices, a training example for a machine-learned model, wherein the training example comprises a plurality of image data values; generating, by the one or more computing devices, an adversarial example from the training example by perturbing, by the one or more computing devices, one or more of the plurality of image data values of the training example according to one or more hyperparameters, wherein generating the adversarial example from the training example comprises: determining, by the one or more computing devices, a direction of a gradient of a loss function that evaluates an output provided by the machine-learned model when given at least a portion of the training example as an input, and perturbing, by the one or more computing devices, the training example in a second direction that is opposite to the direction of the gradient of the loss function to generate the adversarial example; determining, by the one or more computing devices, a realism score for the adversarial example, the realism score being based at least in part on user feedback; adjusting, automatically by the one or more computing devices, at least one of the one or more hyperparameters based at least in part on the realism score for the adversarial example; generating, by the one or more computing devices, an additional adversarial example according to the adjusted one or more hyperparameters; and training, by the one or more computing devices, the machine-learned model based at least in part on the additional adversarial example. 2. The computer-implemented method of claim 1 , wherein the one or more hyperparameters comprise a step size hyperparameter that controls a magnitude of a step in the second direction performed during said perturbing. 3. The computer-implemented method of claim 1 , wherein the one or more hyperparameters comprise one or both of: a norm hyperparameter that controls a norm applied to the gradient prior to said perturbing; and a loss hyperparameter that controls the loss function for which the gradient is determined. 4. The computer-implemented method of claim 1 , wherein determining, by the one or more computing devices, the realism score for the adversarial example comprises: providing, by an on-device machine-learning platform, the adversarial example to an application via an application programming interface; and receiving, by the on-device machine-learning platform, the realism score for the adversarial example from the application via the application programming interface. 5. The computer-implemented method of claim 1 , wherein determining, by the one or more computing devices, the realism score for the adversarial example comprises: providing, by the one or more computing devices, the adversarial example for display to a human user; and receiving, by the one or more computing devices, feedback from the human user that indicates whether the adversarial example appears realistic. 6. The computer-implemented method of claim 1 , further comprising: iteratively performing said generating, determining, and adjusting until a most recent realism score exceeds a threshold score. 7. The computer-implemented method of claim 6 , wherein the threshold score is user-configurable. 8. The computer-implemented method of claim 1 , further comprising: iteratively performing said generating, determining, and adjusting, wherein iteratively performing said generating comprises iteratively reducing a step size hyperparameter that controls a magnitude of a step performed when generating the adversarial example. 9. The computer-implemented method of claim 1 , wherein the one or more computing devices consist of a user computing device, wherein obtaining, by the one or more computing devices, the training example comprises obtaining, by the user computing device, a personal training example that is stored at a local memory of the user computing device, and wherein the machine-learned model is also stored at the local memory of the user computing device. 10. The computer-implemented method of claim 1 , further comprising: when the realism score is greater than a threshold score, storing, by the one or more computing devices, the adversarial example for use in training the machine-learned model; and when the realism score is less than the threshold score, discarding, by the one or more computing devices, the adversarial example. 11. A computer-implemented method, comprising: obtaining, by one or more computing devices, a training example for a machine-learned model, wherein the training example comprises a plurality of data values corresponding to a natural language processing input; generating, by the one or more computing devices, an adversarial example from the training example by perturbing, by the one or more computing devices, one or more of the plurality of data values of the training example according to one or more hyperparameters; generating, by the one or more computing devices, a score for the adversarial example, wherein the score indicates nonconformity of the adversarial example to an acceptable input data space for the natural language processing input; automatically adjusting, by the one or more computing devices, at least one of the one or more hyperparameters based at least in part on the score for the adversarial example, wherein the adjusting is based at least in part on a position of the adversarial example in the input data space relative to a boundary of the acceptable input data space for the model; generating, by the one or more computing devices, an additional adversarial example according to the adjusted one or more hyperparameters, wherein the additional adversarial example conforms to the input data space; and training, by the one or more computing devices, the machine-learned model based at least in part on the additional adversarial example. 12. The method of claim 11 , wherein generating the score comprises: inputting, by the one or more computing devices, the adversarial example into a scoring function that evaluates one or more properties of the adversarial example, relative to the input data space, to generate the score. 13. The method of claim 11 , wherein generating the score comprises: outputting, by the one or more computing devices, the adversarial example to a user; receiving, by the one or more computing devices, a user input indicative of the position of the outputted adversarial example relative to the position of the training example in the input data space; and generating, by the one or more computing devices, the score based on the received user input. 14. A mobile computing device comprising: an application, the application comprising a machine-learned model; one or more processors; and an on-device adversarial training platform implemented by the one or more processors, the on-device adversarial training platform configured to perform operations comprising: obtaining a training example for the machine-learned model, wherein the training example comprises a plurality of image data values; generating an adversarial example from the training example by perturbing one or more of the plurality of image data values of the training example according to one or more hyperparameters, wherein generating the adversarial example from the training example comprises: determining a direction of a gradient of a loss function that evaluates an output provided by the machine-learned model when given at least a portion of the training example as an input, and perturbing th

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • G06N5/04Primary

    Inference or reasoning models · CPC title

  • Combinations of networks · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11494667B2 cover?
Example aspects of the present disclosure are directed to systems and methods that enable improved adversarial training of machine-learned models. An adversarial training system can generate improved adversarial training examples by optimizing or otherwise tuning one or hyperparameters that guide the process of generating of the adversarial examples. The adversarial training system can determin…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).