Generating hyper-parameters for machine learning models using modified bayesian optimization based on accuracy and training efficiency
US-2021295191-A1 · Sep 23, 2021 · US
US11783083B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11783083-B2 |
| Application number | US-202117206712-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 19, 2021 |
| Priority date | Mar 19, 2021 |
| Publication date | Oct 10, 2023 |
| Grant date | Oct 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an approach for computing trade-offs between privacy and accuracy of data analysis on building a learning model, a processor receives a dataset for training a model. The dataset includes one or more pre-identified sensitive data fields. The processor determines a weight of each sensitive data field for the model. The processor evaluates resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields. The processor identifies correlation among the sensitive data fields. The processor presents a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by one or more processors, a dataset for training a model, the dataset including one or more pre-identified sensitive data fields; determining, by one or more processors, a weight of each sensitive data field for the model; evaluating, by one or more processors, resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields; identifying, by one or more processors, correlation among the sensitive data fields; and presenting, by one or more processors, a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique. 2. The computer-implemented method of claim 1 , wherein a sensitivity level of the sensitive data fields is defined as in a range between 0 and 1, where 0 is the least sensitive and 1 is the most sensitive. 3. The computer-implemented method of claim 1 , further comprising: replacing, by one or more processors, features that have been declared as more sensitive features with less sensitive features; and proceeding, by one or more processors, with training the model. 4. The computer-implemented method of claim 1 , wherein the privacy preservation technique includes excluding a column and securing multi-party computation and differential privacy. 5. The computer-implemented method of claim 1 , further comprising: ranking, by one or more processors, the sensitive data fields based on the correlation. 6. The computer-implemented method of claim 1 , wherein the sensitive data fields include information selected from the group consisting of: personally identifying information, private health information, and business sensitive information. 7. The computer-implemented method of claim 1 , wherein the resource cost of the privacy preservation technique is associated with computational resources and speed of computation of applying the privacy preservation technique to the pre-identified sensitive data fields. 8. A computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to receive a dataset for training a model, the dataset including one or more pre-identified sensitive data fields; program instructions to determine a weight of each sensitive data field for the model; program instructions to evaluate resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields; program instructions to identify correlation among the sensitive data fields; and program instructions to present a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique. 9. The computer program product of claim 8 , wherein a sensitivity level of the sensitive data fields is defined as in a range between 0 and 1, where 0 is the least sensitive and 1 is the most sensitive. 10. The computer program product of claim 8 , further comprising: program instructions to replace features that have been declared as more sensitive features with less sensitive features; and program instructions to proceed with training the model. 11. The computer program product of claim 8 , wherein the privacy preservation technique includes excluding a column and securing multi-party computation and differential privacy. 12. The computer program product of claim 8 , further comprising: program instructions to rank the sensitive data fields based on the correlation. 13. The computer program product of claim 8 , wherein the sensitive data fields include information selected from the group consisting of: personally identifying information, private health information, and business sensitive information. 14. The computer program product of claim 8 , wherein the resource cost of the privacy preservation technique is associated with computational resources and speed of computation of applying the privacy preservation technique to the pre-identified sensitive data fields. 15. A computer system comprising: one or more computer processors, one or more computer readable storage media, and program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to receive a dataset for training a model, the dataset including one or more pre-identified sensitive data fields; program instructions to determine a weight of each sensitive data field for the model; program instructions to evaluate resource cost of applying a privacy preservation technique to the one or more pre-identified sensitive data fields; program instructions to identify correlation among the sensitive data fields; and program instructions to present a comparison of options for training the model, in terms of tradeoffs of accuracy for training the model and the resource cost of the privacy preservation technique. 16. The computer system of claim 15 , wherein a sensitivity level of the sensitive data fields is defined as in a range between 0 and 1, where 0 is the least sensitive and 1 is the most sensitive. 17. The computer system of claim 15 , further comprising: program instructions to replace features that have been declared as more sensitive features with less sensitive features; and program instructions to proceed with training the model. 18. The computer system of claim 15 , wherein the privacy preservation technique includes excluding a column and securing multi-party computation and differential privacy. 19. The computer system of claim 15 , further comprising: program instructions to rank the sensitive data fields based on the correlation. 20. The computer system of claim 15 , wherein the sensitive data fields include information selected from the group consisting of: personally identifying information, private health information, and business sensitive information.
Protecting personal data, e.g. for financial or medical purposes · CPC title
Machine learning · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Ensemble learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.