Data privacy protected machine learning systems

US11568306B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11568306-B2
Application numberUS-201916398757-A
CountryUS
Kind codeB2
Filing dateApr 30, 2019
Priority dateFeb 25, 2019
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Approaches for private and interpretable machine learning systems include a system for processing a query. The system includes one or more teacher modules for receiving a query and generating a respective output, one or more privacy sanitization modules for privacy sanitizing the respective output of each of the one or more teacher modules, and a student module for receiving a query and the privacy sanitized respective output of each of the one or more teacher modules and generating a result. Each of the one or more teacher modules is trained using a respective private data set. The student module is trained using a public data set. In some embodiments, human understandable interpretations of an output from the student module is provided to a model user.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of training a student module with knowledge from one or more teacher modules, the method comprising: generating, via the student module with a first set of parameters, a set of student outputs in response to a batch of queries; computing, by a processor, a loss metric comparing the set of student outputs and a set of teacher outputs that are generated via at least one teacher module using the batch of queries, wherein the at least one teacher module has been trained with a dataset not accessible to the student module; perturbing the loss metric with noise; and generating a second set of parameters for an updated student module based on the perturbed loss metric. 2. The method of claim 1 , wherein the at least one teacher module is trained with a set of private data samples inaccessible to the student module. 3. The method of claim 1 , further comprising: causing the batch of queries to be sent to the at least one teacher module; and obtaining the set of teacher outputs from a last hidden layer of the at least one teacher module in response to an input of the batch of queries. 4. The method of claim 1 , wherein computing the loss metric based on the set of student outputs and the set of teacher outputs comprises: computing a set of student classification outputs based on the set of student outputs; computing a set of teacher classification outputs based on the set of teacher outputs; and computing a cross-entropy loss between the set of student classification outputs and the set of teacher classification outputs. 5. The method of claim 1 , wherein perturbing the loss metric with noise comprises: bounding each loss metric computed corresponding to each teacher module with a pre-defined threshold; computing an average of the bounded loss metrics for all teacher modules; and adding a perturbation noise to the computed average of the bounded loss metrics. 6. The method of claim 5 , wherein adding the perturbation noise to the computed average of the bounded loss metrics comprises: generating a Gaussian noise having a zero mean and a variance based on a private budget parameter and the pre-defined threshold. 7. The method of claim 1 , wherein generating the second set of parameters for an updated student module based on the perturbed loss metric comprises: using backpropagation on the student module by the perturbed loss metric to obtain the second set of parameters. 8. The method of claim 1 , further comprising using hint learning with the at least one teacher module to update at least the subset of the first set of parameters for the student module by: generating a set of intermediate student outputs via a student guided layer configured with a set of student guided layer parameters in response to an input of the batch of queries; and generating a set of adapted intermediate student outputs via an adaptation layer configured with a set of adaptation layer parameters, wherein the at least the subset of the first set of parameters for the student module includes the set of student guided layer parameters and the set of adaptation layer parameters. 9. The method of claim 8 , further comprising: computing a hint loss metric based on the set of adapted intermediate student outputs and the set of teacher outputs that is generated via at least one teacher module using the batch of queries; perturbing the hint loss metric with the noise; and using backpropagation by the perturbed hint loss metric to update the set of student guided layer parameters and the set of adaptation layer parameters. 10. The method of claim 1 , further comprising: incorporating a interpretation module that provides a human understandable interpretation of outputs with the student module. 11. A system for data privacy protected training, the system comprising: one or more teacher modules for generating at least one set of teacher outputs in response to a batch of queries from a data set; a student module for generating a set of student outputs in response to the batch of queries; wherein each of the one or more teacher modules has been trained with a respective dataset not accessible to the student module; one or more privacy sanitization modules for perturbing, a loss metric computed comparing the set of teacher outputs and the set of student outputs, with noise and providing the perturbed loss metric to the student module; and wherein the student module is updated using the perturbed loss metric. 12. The system of claim 11 , further comprising: a loss module for obtaining the set of teacher outputs from a last hidden layer of the at least one teacher module in response to an input of the batch of queries. 13. The system of claim 12 , wherein the loss module is further configured for: computing a set of student classification outputs based on the set of student outputs; computing a set of teacher classification outputs based on the set of teacher outputs; and computing the loss metric as a cross-entropy loss between the set of student classification outputs and the set of teacher classification outputs. 14. The system of claim 11 , wherein the one or more privacy sanitization modules are further configured for: bounding each loss metric computed corresponding to each teacher module with a pre-defined threshold; computing an average of the bounded loss metrics for all teacher modules; and adding a perturbation noise to the computed average of the bounded loss metrics. 15. The system of claim 14 , wherein the one or more privacy sanitization modules are further configured for: generating a Gaussian noise having a zero mean and a variance based on a private budget parameter and the pre-defined threshold. 16. The system of claim 11 , wherein the student module is updated using backpropagation by the perturbed loss metric to obtain an updated set of parameters for the student module. 17. The system of claim 16 , further comprising: a hint loss module for using hint learning with the at least one teacher module to update at least a subset of the set of parameters for the student module before using the perturbed loss metric to update the student module. 18. The system of claim 17 , wherein the student module is further configured for: generating a set of intermediate student outputs via a student guided layer configured with a set of student guided layer parameters in response to an input of the batch of queries; and generating a set of adapted intermediate student outputs via an adaptation layer configured with a set of adaptation layer parameters; wherein the at least the subset of the set of parameters for the student module includes the set of student guided layer parameters and the set of adaptation layer parameters. 19. The system of claim 18 , wherein the hint loss module is further configured for: computing a hint loss metric based on the set of adapted intermediate student outputs and the set of teacher outputs that is generated via at least one teacher module using the batch of queries; perturbing the hint loss metric with the noise; and using backpropagation by the perturbed hint loss metric to update the set of student guided layer parameters and the set of adaptation layer parameters. 20. A processor-readable non-transitory storage medium storing processor-executable instructions of training a student module with perturbed knowledge from one or more teacher modules, the instructions executable by a processor to: generate, via the student modul

Assignees

Inventors

Classifications

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Physics · mapped topic

  • G06N20/00Primary

    Machine learning · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568306B2 cover?
Approaches for private and interpretable machine learning systems include a system for processing a query. The system includes one or more teacher modules for receiving a query and generating a respective output, one or more privacy sanitization modules for privacy sanitizing the respective output of each of the one or more teacher modules, and a student module for receiving a query and the pri…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).