Detecting quasi-identifiers in datasets

US9870381B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9870381-B2
Application numberUS-201514719663-A
CountryUS
Kind codeB2
Filing dateMay 22, 2015
Priority dateMay 22, 2015
Publication dateJan 16, 2018
Grant dateJan 16, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Quasi-identifiers (QIDs) are detected in a dataset using a set of computing tasks. The dataset has a plurality of records and a set of attributes. An index is generated for the dataset. The index has an indicator for each attribute value of each record in the dataset. Each indicator specifies all the records in the dataset having the same value for the attribute. Each task is assigned an attribute combination and a subset of the plurality of records in the dataset and is passed to a thread for execution on computing resources. The executing task inspects the set of records specified by the index indicator for each attribute value in the attribute combination to produce a result. The result of at least one task identifies a unique record for the associated attribute combination. The attribute combination producing the unique record is a QID.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for detecting quasi-identifiers in a dataset using a set of computing tasks, the dataset having a plurality of records and further having a set of attributes, each record having an attribute value for each attribute in the set of attributes, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to perform a method comprising: generating a first index for the dataset, the first index having an index indicator for each attribute value of each record, each index indicator specifying a set of records, the specified set of records including each record in the plurality of records having the same attribute value for the associated attribute as the associated record; assigning an attribute combination to each task in the set of computing tasks, the attribute combination for each task including one or more attributes of the set of attributes; assigning a subset of the plurality of records to each task in the set of computing tasks; detecting at least one quasi-identifier by passing each task to at least one thread for execution on computing resources, the execution of each task comprising inspecting the index indicator for each attribute value in the assigned attribute combination of at least a portion of the assigned subset of the plurality of records to produce a result, the result of at least one task identifying a unique record for the associated attribute combination, the attribute values in the attribute combination for the unique record different from the attribute values in the attribute combination for all other records in the plurality of records, the at least one quasi-identifier being the attribute combination assigned to the at least one task identifying a unique record. 2. The computer program product of claim 1 , wherein the method further comprises: assigning a second attribute combination to each task in the set of computing tasks, the second attribute combination for each task including one or more attributes of the set of attributes, the second attribute combination for each task excluding the detected at least one quasi-identifier; detecting a second at least one quasi-identifier by second passing each task to the at least one thread for execution on the computing resources, the execution of each task comprising inspecting the index indicator for each attribute value in the assigned second attribute combination of at least a portion of the assigned subset of the plurality of records to produce a second result, the second result of at least one task identifying a unique record for the associated second attribute combination, the second at least one quasi-identifier being the second attribute combination assigned to the at least one task identifying a unique record. 3. The computer program product of claim 1 , wherein the detecting the at least one quasi-identifier includes detecting a first quasi-identifier by passing a first task to a first thread for execution on the computing resources, wherein the first quasi-identifier is detected before inspecting the index indicator for each attribute value in the assigned attribute combination of a last portion of the assigned subset of the plurality of records, and wherein the method further comprises: stopping the first thread upon detecting the first quasi-identifier, the stopping the first thread preventing inspecting the index indicator for each attribute value in the assigned attribute combination of the last portion of the assigned subset. 4. The computer program product of claim 1 , wherein each attribute in the set of attributes is represented by a set of distinct attribute values, and wherein the generating the first index for the data set comprises: generating a second index for each attribute in the set of attributes, each second index comprising a tree structure having a hierarchical set of nodes corresponding to the set of distinct attribute values representing the set of attributes, each node specifying a second set of records, the specified second set of records including each record in the plurality of records having the distinct attribute value corresponding to the node; and for each attribute value in the plurality of records, traversing the tree structure associated with the attribute to locate the node corresponding to the attribute value, and causing the index indicator for the attribute value in the first index to specify the second set of records specified by the located node. 5. A computer system for detecting quasi-identifiers in a dataset using a set of computing tasks, the dataset having a plurality of records and further having a set of attributes, each record having an attribute value for each attribute in the set of attributes, the computer system comprising a processor and a computer readable storage medium storing instructions, wherein the computer readable storage medium is not a transitory signal per se, and wherein the processor executes the instructions to perform a method comprising: generating a first index for the dataset, the first index having an index indicator for each attribute value of each record, each index indicator specifying a set of records, the specified set of records including each record in the plurality of records having the same attribute value for the associated attribute as the associated record; and assigning, using a main thread, an attribute combination to each task in the set of computing tasks, the attribute combination for each task including one or more attributes of the set of attributes, the main thread further used by the processor to assign a subset of the plurality of records to each task in the set of computing tasks, and the main thread further used by the processor to detect at least one quasi-identifier by passing each task to at least one thread for execution on computing resources, the execution of each task comprising inspecting the index indicator for each attribute value in the assigned attribute combination of at least a portion of the assigned subset of the plurality of records to produce a result, the result of at least one task identifying a unique record for the associated attribute combination, the attribute values in the attribute combination for the unique record different from the attribute values in the attribute combination for all other records in the plurality of records, the at least one quasi-identifier being the attribute combination assigned to the at least one task identifying a unique record. 6. The computer system of claim 5 , wherein the main thread is further used by the processor to assign, before the assigning the attribute combination to each task, a prior attribute combination to each task in the set of computing tasks, the prior attribute combination for each task including one or more attributes of the set of attributes, the prior attribute combination for each task different from all attribute combinations, and wherein the main thread is further used by the processor to detect, before the detecting the at least one quasi-identifier, no quasi-identifier by passing each task to the at least one thread for execution on the computing resources, the execution of each task comprising inspecting the index indicator for each attribute value in the assigned prior attribute combination of the assigned subset of the plurality of records to produce a result, the result of each task identifying no unique record for the associated attribute combination. 7. The computer system of claim 5 , wherein the main thread is further used by the processor to assign, before the assigning the subset of the

Assignees

Inventors

Classifications

  • considering software capabilities, i.e. software resources associated or available to the machine · CPC title

  • Indexing structures · CPC title

  • to service a request · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9870381B2 cover?
Quasi-identifiers (QIDs) are detected in a dataset using a set of computing tasks. The dataset has a plurality of records and a set of attributes. An index is generated for the dataset. The index has an indicator for each attribute value of each record in the dataset. Each indicator specifies all the records in the dataset having the same value for the attribute. Each task is assigned an attrib…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2228. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 16 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).