Iterative execution of data de-identification processes

US11036886B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11036886-B2
Application numberUS-201916447064-A
CountryUS
Kind codeB2
Filing dateJun 20, 2019
Priority dateFeb 26, 2018
Publication dateJun 15, 2021
Grant dateJun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer system de-identifies data by selecting one or more attributes of a dataset and determining a set of data de-identification techniques associated with each attribute. Each de-identification technique is evaluated with respect to an impact on data privacy and an impact on data utility based on a series of metrics, and a data de-identification technique is recommended for each attribute based on the evaluation. The dataset is de-identified by applying the de-identification technique that is recommended for each attribute. Embodiments of the present invention further include a method and program product for de-identifying data in substantially the same manner described above.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to de-identify data, the method comprising: selecting a plurality of attributes of a dataset; determining a set of data de-identification techniques, including one or more data de-identification techniques for each attribute of the plurality of attributes; evaluating each data de-identification technique with respect to an impact on data privacy and an impact on data utility based on a series of metrics; recommending a data de-identification technique for each attribute based on the evaluation, wherein the recommended data de-identification technique for each attribute is presented using a user interface dashboard that indicates a plurality of privacy metrics and a plurality of utility metrics resulting from applying the recommended data de-identification technique, and wherein the user interface dashboard indicates the plurality of privacy metrics and the plurality of utility metrics for each attribute of the plurality of attributes; applying the recommended data de-identification technique for each attribute to de-identify the dataset; removing each applied data de-identification technique from the recommended data de-identification techniques to be applied to the dataset; re-evaluating remaining data de-identification techniques for selected attributes of the de-identified data set with respect to an impact on data privacy and an impact on data utility based on the series of metrics; recommending a second data de-identification technique for each selected attribute of the de-identified data set, wherein the recommended second data de-identification technique for each attribute is presented using the user interface dashboard that indicates the plurality of privacy metrics and the plurality of utility metrics resulting from applying the recommended second data de-identification technique, and wherein the user interface dashboard indicates the plurality of privacy metrics and the plurality of utility metrics for each selected attribute of the selected attributes; and applying the recommended second data de-identification technique for each selected attribute of the de-identified data set to further de-identify the dataset. 2. The method of claim 1 , further comprising: presenting the de-identified data set and recommended data de-identification techniques and corresponding configuration options on the user interface dashboard. 3. The method of claim 1 , wherein one or more of the plurality of attributes include a direct identifier associated with one or more of a set of data masking techniques, a set of data pseudonymization techniques, and a set of data encryption techniques. 4. The method of claim 1 , wherein one or more of the plurality of attributes include a set of quasi-identifiers associated with a set of data anonymization techniques. 5. The method of claim 1 , wherein metrics associated with data privacy include probabilistic data linkage against one or more datasets from a group of publicly available datasets and user provided datasets, and uniqueness criteria of the dataset. 6. The method of claim 1 , wherein metrics associated with data utility include data distortion introduced into the dataset by a data de-identification technique and workload-aware metrics that capture usefulness of the de-identified data in supporting certain types of analyses. 7. The method of claim 1 , wherein a metric associated with data utility comprises an average relative error metric.

Assignees

Inventors

Classifications

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Providing cryptographic facilities or services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11036886B2 cover?
A computer system de-identifies data by selecting one or more attributes of a dataset and determining a set of data de-identification techniques associated with each attribute. Each de-identification technique is evaluated with respect to an impact on data privacy and an impact on data utility based on a series of metrics, and a data de-identification technique is recommended for each attribute…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).