Detecting quasi-identifiers in datasets
US-9870381-B2 · Jan 16, 2018 · US
US11269834B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11269834-B2 |
| Application number | US-201916446674-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 20, 2019 |
| Priority date | May 22, 2015 |
| Publication date | Mar 8, 2022 |
| Grant date | Mar 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Quasi-identifiers (QIDs) are detected in a dataset using a set of computing tasks. The dataset has a plurality of records and a set of attributes. An index is generated for the dataset. The index has an indicator for each attribute value of each record in the dataset. Each indicator specifies all the records in the dataset having the same value for the attribute. Each task is assigned an attribute combination and a subset of the plurality of records in the dataset and is passed to a thread for execution on computing resources. The executing task inspects the set of records specified by the index indicator for each attribute value in the attribute combination to produce a result. The result of at least one task identifies a unique record for the associated attribute combination. The attribute combination producing the unique record is a QID.
Opening claim text (preview).
What is claimed is: 1. A method for detecting quasi-identifiers in a dataset, comprising: generating a first index for the dataset, wherein: the dataset comprises a plurality of records and a plurality of attributes; the plurality of records comprises a plurality of attribute values corresponding to the plurality of attributes; the first index comprises a plurality of index indicators for a plurality of attribute values of the plurality of attributes; the plurality of index indicators specifies a specific subset of the plurality of records; and the specific subset of the plurality of records comprises all of the plurality of records having a same attribute value; generating a plurality of tasks by associating the plurality of tasks with a corresponding plurality of attribute combinations of the plurality of attributes, wherein the plurality of attribute combinations comprises one or more of the plurality of attributes; associating a subset of the plurality of records with the plurality of tasks; and detecting at least one quasi-identifier by: passing the plurality of tasks to at least one of a plurality of threads for execution on a computing resource, wherein: the plurality of tasks comprises inspecting the index indicator for one of the attribute values of an associated attribute combination of the plurality of attribute combinations in at least a portion of the assigned subset of the plurality of records to produce a result; the result identifies a unique record of the assigned subset of the plurality of records for the associated attribute combination; the attribute combination for the unique record comprises different attribute values from one or more attribute values in the attribute combination for all other records in the assigned subset of the plurality of records; and the at least one quasi-identifier comprises the attribute combination for the unique record; and passing a dummy task to the one thread, wherein: the one thread is configured to send a message to a main thread indicating that the one thread has processed a last task; and the main thread is configured to terminate the one thread in response to receiving the message. 2. The method of claim 1 , further comprising: associating, before associating the plurality of tasks with the plurality of attribute combinations, a plurality of prior attribute combinations with the plurality of tasks, wherein: the plurality of prior attribute combinations comprises a prior plurality of the attributes; and the plurality of prior attribute combinations is different from all of the plurality of attribute combinations; and detecting, before the detecting the at least one quasi-identifier, no quasi-identifier by: passing the plurality of tasks to the at least one of a plurality of threads for execution on the computing resource, wherein: the plurality of tasks comprises inspecting the index indicator for one of the attribute values of an associated prior attribute combination of the plurality of prior attribute combinations in at least a portion of the assigned subset of the plurality of records to produce the result; the result identifies a unique record of the assigned subset of the plurality of records for the associated prior attribute combination; the prior attribute combination for the unique record comprises different attribute values from one or more prior attribute values in the prior attribute combination for all other records in the assigned subset of the plurality of records; and the at least one quasi-identifier comprises the attribute combination for the unique record. 3. The method of claim 1 , further comprising: associating, before associating the subset of the plurality of records with the plurality of tasks, a prior subset of the plurality of records with the plurality of tasks, the prior subset of the plurality of records different from all subsets of the plurality of records; detecting, before detecting the at least one quasi-identifier, no quasi-identifier by passing the plurality of tasks to the at least one thread for an additional execution on the computing resource, wherein the additional execution comprises inspecting the index indicator for a plurality of associated attribute values of the associated attribute combination of the prior subset of the plurality of records to produce the result, the result identifying no unique record for the associated attribute combination. 4. The method of claim 1 , further comprising: associating a second plurality of attribute combinations to the plurality of tasks, wherein: the second plurality of attribute combinations comprises one or more of the attributes; and the second plurality of attribute combinations does not include the detected at least one quasi-identifier; and detecting a second quasi-identifier by passing the plurality of tasks to the at least one thread for a second execution on the computing resource, wherein the second execution comprises inspecting the index indicator for a second plurality of attribute values in the second plurality of attribute combinations of at least a portion of the assigned subset of the plurality of records to produce a second result, wherein: the second result identifies a unique record for an associated second attribute combination of the second plurality of attribute combinations; and the second quasi-identifier being one of the second plurality of attribute combinations that identify a second unique record. 5. The method of claim 4 , wherein: the plurality of attribute combinations comprises a first number of the plurality of attributes; the second plurality of attribute combinations comprises a second number of the plurality of attributes, the second number different from the first number, the method further comprising: ranking, before passing the plurality of tasks to the at least one of the plurality of threads, the plurality of attribute combinations higher than the second plurality of attribute combinations according to predetermined criteria. 6. The method of claim 1 , wherein detecting the at least one quasi-identifier comprises: detecting a first quasi-identifier by passing a first task of the plurality of tasks to a first thread of the plurality of threads for additional execution on the computing resource, wherein the first quasi-identifier is detected before inspecting the plurality of index indicators, the method further comprising: stopping the first thread upon detecting the first quasi-identifier, wherein stopping the first thread prevents inspecting the plurality of index indicators for the plurality of attribute values in a last portion of the assigned subset of the plurality of records. 7. The method of claim 1 , wherein the plurality of tasks is executed in parallel. 8. The method of claim 1 , wherein: the plurality of records comprises a record identifier; the plurality of index indicators is a pointer to a plurality of record identifiers; a first index indicator for a first attribute of a first record is a first pointer to a first plurality of record identifiers; a second index indicator for a second attribute of a second record is a second pointer to a second plurality of record identifiers; a third index indicator for a third attribute of a third record is the first pointer to the first plurality of record identifiers; and the first plurality of record identifiers comprises: a first record identifier for the first record; and a third record identifier for the third record, wherein the first plurality of record identifiers does not comprise a second record identifier for the second record. 9. A computer program product comprising program instructions stored on a computer readable sto
Indexing structures · CPC title
to service a request · CPC title
considering software capabilities, i.e. software resources associated or available to the machine · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.