Defending against model inversion attacks on neural networks

US10733292B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10733292-B2
Application numberUS-201816031330-A
CountryUS
Kind codeB2
Filing dateJul 10, 2018
Priority dateJul 10, 2018
Publication dateAug 4, 2020
Grant dateAug 4, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Mechanisms are provided for protecting a neural network model against model inversion attacks. The mechanisms generate a decoy dataset comprising decoy data for each class recognized by a neural network model. The mechanisms further configure the neural network model to generate a modified output based on the decoy dataset that directs a gradient of the modified output to the decoy dataset. The neural network model receives and process input data to generate an actual output. The neural network model modifies one or more actual elements of the actual output to be one or more corresponding modified elements of the modified output, and returns the one or more corresponding modified elements, instead of the one or more actual elements, to the source computing device.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for protecting a neural network model against model inversion attacks, the method being performed in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to specifically configure the at least one processor to implement the neural network model and a targeted deceptive gradient engine, the method comprising: generating, by the targeted deceptive gradient engine, a decoy dataset comprising decoy data for each class recognized by the neural network model; configuring, by the targeted deceptive gradient engine, a first neural network model to generate a modified output based on the decoy dataset that directs a gradient of the modified output to the decoy dataset; receiving, by the first neural network model, from a source computing device, input data to be processed by the first neural network model; processing, by the first neural network model, the input data to generate an actual output; modifying, by the first neural network model, one or more actual elements of the actual output to be one or more corresponding modified elements of the modified output; and returning, by the first neural network model, the one or more corresponding modified elements instead of the one or more actual elements, to the source computing device. 2. The method of claim 1 , wherein the modified output obscures a gradient of a loss function of the first neural network model. 3. The method of claim 1 , wherein the one or more modified elements of the modified output provide a correct classification of the input data, but modified confidence scores associated with the classifications that direct a gradient of a loss function of the first neural network model towards the decoy dataset. 4. The method of claim 1 , wherein the modified output equates a gradient of a loss function of the first neural network model to a difference between the decoy data and training data used to train the first neural network model for each class recognized by the first neural network model. 5. The method of claim 1 , wherein the modified output maintains a largest class label between the modified output and actual output of the first neural network model to be the same largest class label. 6. The method of claim 1 , further comprising: training a second neural network model with an original training dataset and the decoy dataset to identify input data as being either actual input data corresponding to the original training dataset or decoy data corresponding to the decoy dataset; and determining, by the second neural network model, whether the received input data, for processing by the first neural network model, approximates decoy data in the decoy dataset. 7. The method of claim 6 , wherein the first neural network model processes the input data in response to the second neural network model determining that the received input data does not approximate decoy data in the decoy dataset. 8. The method of claim 6 , further comprising: performing, by a protective action logic engine executing in the data processing system, a protective action in response to a determination by the second neural network model that the received input data approximates decoy data in the decoy dataset. 9. The method of claim 8 , wherein the protective action comprises at least one of logging a request associated with the received input data, sending a notification message to a system administrator, or preventing access to a protected resource. 10. The method of claim 1 , wherein the data processing system is a cloud computing system comprising a plurality of server computing devices, and wherein the at least one processor and at least one memory comprise at least one processor and at least one memory in each server computing device in the plurality of server computing devices. 11. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to specifically configure the data processing system to implement a first neural network model and a targeted deceptive gradient engine, the data processing system being further configured by the computer readable program to: generate, by the targeted deceptive gradient engine, a decoy dataset comprising decoy data for each class recognized by the neural network model; configure, by the targeted deceptive gradient engine, a first neural network model to generate a modified output based on the decoy dataset that directs a gradient of the modified output to the decoy dataset; receive, by the first neural network model, from a source computing device, input data to be processed by the first neural network model; process, by the first neural network model, the input data to generate an actual output; modify, by the first neural network model, one or more actual elements of the actual output to be one or more corresponding modified elements of the modified output; and return, by the first neural network model, the one or more corresponding modified elements instead of the one or more actual elements, to the source computing device. 12. The computer program product of claim 11 , wherein the modified output obscures a gradient of a loss function of the first neural network model. 13. The computer program product of claim 11 , wherein the one or more modified elements of the modified output provide a correct classification of the input data, but modified confidence scores associated with the classifications that direct a gradient of a loss function of the first neural network model towards the decoy dataset. 14. The computer program product of claim 11 , wherein the modified output equates a gradient of a loss function of the first neural network model to a difference between the decoy data and training data used to train the first neural network model for each class recognized by the first neural network model. 15. The computer program product of claim 11 , wherein the modified output maintains a largest class label between the modified output and actual output of the first neural network model to be the same largest class label. 16. The computer program product of claim 11 , wherein the data processing system is further configured by the computer readable program to: train a second neural network model with an original training dataset and the decoy dataset to identify input data as being either actual input data corresponding to the original training dataset or decoy data corresponding to the decoy dataset; and determine, by the second neural network model, whether the received input data, for processing by the first neural network model, approximates decoy data in the decoy dataset. 17. The computer program product of claim 16 , wherein the first neural network model processes the input data in response to the second neural network model determining that the received input data does not approximate decoy data in the decoy dataset. 18. The computer program product of claim 16 , wherein the data processing system is further configured by the computer readable program to: perform, by a protective action logic engine executing in the data processing system, a protective action in response to a determination by the second neural network model that the received input data approximates decoy data in the decoy dataset. 19. The computer program produc

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Learning methods · CPC title

  • G06F21/554Primary

    involving event detection and direct action · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10733292B2 cover?
Mechanisms are provided for protecting a neural network model against model inversion attacks. The mechanisms generate a decoy dataset comprising decoy data for each class recognized by a neural network model. The mechanisms further configure the neural network model to generate a modified output based on the decoy dataset that directs a gradient of the modified output to the decoy dataset. The…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F21/554. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).