Computing trade-offs between privacy and accuracy of data analysis

US11783083B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11783083-B2
Application numberUS-202117206712-A
CountryUS
Kind codeB2
Filing dateMar 19, 2021
Priority dateMar 19, 2021
Publication dateOct 10, 2023
Grant dateOct 10, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an approach for computing trade-offs between privacy and accuracy of data analysis on building a learning model, a processor receives a dataset for training a model. The dataset includes one or more pre-identified sensitive data fields. The processor determines a weight of each sensitive data field for the model. The processor evaluates resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields. The processor identifies correlation among the sensitive data fields. The processor presents a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by one or more processors, a dataset for training a model, the dataset including one or more pre-identified sensitive data fields; determining, by one or more processors, a weight of each sensitive data field for the model; evaluating, by one or more processors, resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields; identifying, by one or more processors, correlation among the sensitive data fields; and presenting, by one or more processors, a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique. 2. The computer-implemented method of claim 1 , wherein a sensitivity level of the sensitive data fields is defined as in a range between 0 and 1, where 0 is the least sensitive and 1 is the most sensitive. 3. The computer-implemented method of claim 1 , further comprising: replacing, by one or more processors, features that have been declared as more sensitive features with less sensitive features; and proceeding, by one or more processors, with training the model. 4. The computer-implemented method of claim 1 , wherein the privacy preservation technique includes excluding a column and securing multi-party computation and differential privacy. 5. The computer-implemented method of claim 1 , further comprising: ranking, by one or more processors, the sensitive data fields based on the correlation. 6. The computer-implemented method of claim 1 , wherein the sensitive data fields include information selected from the group consisting of: personally identifying information, private health information, and business sensitive information. 7. The computer-implemented method of claim 1 , wherein the resource cost of the privacy preservation technique is associated with computational resources and speed of computation of applying the privacy preservation technique to the pre-identified sensitive data fields. 8. A computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to receive a dataset for training a model, the dataset including one or more pre-identified sensitive data fields; program instructions to determine a weight of each sensitive data field for the model; program instructions to evaluate resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields; program instructions to identify correlation among the sensitive data fields; and program instructions to present a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique. 9. The computer program product of claim 8 , wherein a sensitivity level of the sensitive data fields is defined as in a range between 0 and 1, where 0 is the least sensitive and 1 is the most sensitive. 10. The computer program product of claim 8 , further comprising: program instructions to replace features that have been declared as more sensitive features with less sensitive features; and program instructions to proceed with training the model. 11. The computer program product of claim 8 , wherein the privacy preservation technique includes excluding a column and securing multi-party computation and differential privacy. 12. The computer program product of claim 8 , further comprising: program instructions to rank the sensitive data fields based on the correlation. 13. The computer program product of claim 8 , wherein the sensitive data fields include information selected from the group consisting of: personally identifying information, private health information, and business sensitive information. 14. The computer program product of claim 8 , wherein the resource cost of the privacy preservation technique is associated with computational resources and speed of computation of applying the privacy preservation technique to the pre-identified sensitive data fields. 15. A computer system comprising: one or more computer processors, one or more computer readable storage media, and program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to receive a dataset for training a model, the dataset including one or more pre-identified sensitive data fields; program instructions to determine a weight of each sensitive data field for the model; program instructions to evaluate resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields; program instructions to identify correlation among the sensitive data fields; and program instructions to present a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique. 16. The computer system of claim 15 , wherein a sensitivity level of the sensitive data fields is defined as in a range between 0 and 1, where 0 is the least sensitive and 1 is the most sensitive. 17. The computer system of claim 15 , further comprising: program instructions to replace features that have been declared as more sensitive features with less sensitive features; and program instructions to proceed with training the model. 18. The computer system of claim 15 , wherein the privacy preservation technique includes excluding a column and securing multi-party computation and differential privacy. 19. The computer system of claim 15 , further comprising: program instructions to rank the sensitive data fields based on the correlation. 20. The computer system of claim 15 , wherein the sensitive data fields include information selected from the group consisting of: personally identifying information, private health information, and business sensitive information.

Assignees

Inventors

Classifications

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • Machine learning · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Ensemble learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11783083B2 cover?
In an approach for computing trade-offs between privacy and accuracy of data analysis on building a learning model, a processor receives a dataset for training a model. The dataset includes one or more pre-identified sensitive data fields. The processor determines a weight of each sensitive data field for the model. The processor evaluates resource cost of applying a privacy preservation techni…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).