Data de-identification based on detection of allowable configurations for data de-identification processes

US10915662B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10915662-B2
Application numberUS-201715843049-A
CountryUS
Kind codeB2
Filing dateDec 15, 2017
Priority dateDec 15, 2017
Publication dateFeb 9, 2021
Grant dateFeb 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for de-identifying data determines one or more identifiers that identify an entity of a dataset. One or more data de-identification processes are identified and associated with the determined one or more identifiers. Each data de-identification process is associated with one or more sets of configuration options indicating information to preserve in the dataset. The identified data de-identification processes are executed on the dataset in accordance with the associated sets of configuration options to generate datasets with varying preserved information. The generated datasets are evaluated for privacy vulnerabilities and a data de-identification process and an associated set of configuration options are selected based on the evaluation. The selected data de-identification process is executed on the dataset according to the associated set of configuration options to produce a resulting de-identified data set. Embodiments include a method and computer program product for de-identifying data in substantially the same manner described above.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of de-identifying data comprising: determining one or more identifiers that identify an entity of a dataset; identifying one or more data de-identification processes associated with the determined one or more identifiers, wherein a plurality of sets of configuration options indicating information to preserve in the dataset are associated with the identified one or more data de-identification processes; executing, via a processor, the identified one or more data de-identification processes on the dataset in accordance with the associated sets of configuration options to generate datasets with varying preserved information; replacing two or more attributes of a generated dataset with intersecting information with a consolidated attribute, wherein the consolidated attribute includes information from an attribute of the two or more attributes having greater precision; evaluating, via a processor, the generated datasets for privacy vulnerabilities; selecting, via a processor, a data de-identification process and an associated set of configuration options producing a generated dataset lacking privacy vulnerabilities based on the evaluation; and executing, via a processor, the selected data de-identification process on the dataset according to the associated set of configuration options to produce a resulting de-identified dataset. 2. The method of claim 1 , wherein determining the one or more identifiers further comprises: determining one or more direct identifiers, wherein the identified one or more data de-identification processes associated with the determined one or more identifiers include data masking processes. 3. The method of claim 1 , wherein determining the one or more identifiers further comprises: determining a plurality of quasi-identifiers, wherein the identified one or more data de-identification processes associated with the determined one or more identifiers include data generalization or data suppression. 4. The method of claim 1 , wherein the generated datasets are in the form of a table, and executing the identified one or more data de-identification processes further comprises: consolidating two or more columns of a first generated dataset to produce a column with information more specific than the two or more columns. 5. The method of claim 1 , wherein evaluating the generated datasets for privacy vulnerabilities further comprises: determining a presence of a link between data for an entity in a first generated dataset and data for a known entity in a publicly available dataset to indicate a privacy vulnerability for the first generated dataset. 6. The method of claim 1 , wherein evaluating the generated datasets for privacy vulnerabilities further comprises: determining a presence of a set of quasi-identifiers in a first generated dataset introduced by a corresponding data de-identification process and associated set of configuration options to indicate a privacy vulnerability for the first generated dataset. 7. The method of claim 1 , further comprising: generating a series of templates for each data de-identification process, wherein each template specifies an associated set of configuration options for that data de-identification process. 8. The method of claim 1 , further comprising: reducing processing time for the de-identification by determining a generated dataset lacking privacy vulnerabilities and terminating evaluation of at least one other associated set of configuration options for a corresponding data de-identification process generating one or more datasets with more generalized information than the determined generated dataset. 9. A system for de-identifying data comprising: at least one processor configured to: determine one or more identifiers that identify an entity of a dataset; identify one or more data de-identification processes associated with the determined one or more identifiers, wherein a plurality of sets of configuration options indicating information to preserve in the dataset are associated with the identified one or more data de-identification processes; execute the identified one or more data de-identification processes on the dataset in accordance with the associated sets of configuration options to generate datasets with varying preserved information; replace two or more attributes of a generated dataset with intersecting information with a consolidated attribute, wherein the consolidated attribute includes information from an attribute of the two or more attributes having greater precision; evaluate the generated datasets for privacy vulnerabilities; select a data de-identification process and an associated set of configuration options producing a generated dataset lacking privacy vulnerabilities based on the evaluation; and execute the selected data de-identification process on the dataset according to the associated set of configuration options to produce a resulting de-identified dataset. 10. The system of claim 9 , wherein determining the one or more identifiers further comprises: determining one or more direct identifiers, wherein the identified one or more data de-identification processes associated with the determined one or more identifiers include data masking processes. 11. The system of claim 9 , wherein determining the one or more identifiers further comprises: determining a plurality of quasi-identifiers, wherein the identified one or more data de-identification processes associated with the determined one or more identifiers include data generalization or data suppression. 12. The system of claim 9 , wherein the generated datasets are in the form of a table, and executing the identified one or more data de-identification processes further comprises: consolidating two or more columns of a first generated dataset to produce a column with information more specific than the two or more columns. 13. The system of claim 9 , wherein evaluating the generated datasets for privacy vulnerabilities further comprises: determining a presence of a link between data for an entity in a first generated dataset and data for a known entity in a publicly available dataset to indicate a privacy vulnerability for the first generated dataset. 14. The system of claim 9 , wherein evaluating the generated datasets for privacy vulnerabilities further comprises: determining a presence of a set of quasi-identifiers in a first generated dataset introduced by a corresponding data de-identification process and associated set of configuration options to indicate a privacy vulnerability for the first generated dataset. 15. The system of claim 9 , wherein the at least one processor is further configured to: generate a series of templates for each data de-identification process, wherein each template specifies an associated set of configuration options for that data de-identification process. 16. The system of claim 9 , wherein the at least one processor is further configured to: reduce processing time for the de-identification by determining a generated dataset lacking privacy vulnerabilities and terminating evaluation of at least one other associated set of configuration options for a corresponding data de-identification process generating one or more datasets with more generalized information than the determined generated dataset. 17. A computer program product for de-identifying data, the computer program product comprising one or more computer readable storage media collectively having computer readable program code embodied therewith, the computer readable program

Assignees

Inventors

Classifications

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Assessing vulnerabilities and evaluating computer system security · CPC title

  • Test or assess software · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10915662B2 cover?
A system for de-identifying data determines one or more identifiers that identify an entity of a dataset. One or more data de-identification processes are identified and associated with the determined one or more identifiers. Each data de-identification process is associated with one or more sets of configuration options indicating information to preserve in the dataset. The identified data de-…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).