Privacy-preserving data platform

US11544406B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11544406-B2
Application numberUS-202016869170-A
CountryUS
Kind codeB2
Filing dateMay 7, 2020
Priority dateFeb 7, 2020
Publication dateJan 3, 2023
Grant dateJan 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for synthesizing and analyzing data are disclosed. A ML model anonymizes microdata to generate synthesized data. This anonymizing is performed by reproducing attributes identified within microdata and by applying constraints to prevent rare attribute combinations from being reproduced in the synthesized data. User input selects attributes to filter the synthesized data, thereby generating a subset of records. A UI displays a synthesized aggregate count representing how many records are in the subset. Pre-computed aggregate counts are accessed to indicate how many records in the microdata embody certain attributes. Based on the user input, there is an attempt to identify a particular count from the pre-computed aggregate counts. This count reflects how many records of the microdata would remain if the selected attributes were used to filter the microdata. That count is displayed along with the synthesized aggregate count. The two counts are juxtaposed next to one another.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system configured to facilitate improved confidence in an accuracy relating to statistics derived from synthetic data generated from microdata, said computer system comprising: one or more processors; and one or more computer-readable hardware storage devices that store computer-executable instructions that configure the computer system to at least: generate synthesized data by anonymizing microdata using a machine learning (ML) model, wherein the ML model generates the synthesized data by: reproducing, within the synthesized data, identified attributes that are identified from within the microdata, and applying a set of constraints that prevent rare combinations of the attributes from being reproduced in the synthesized data, said rare combinations of the attributes being combinations that satisfy a rarity threshold within the microdata; within a user interface (UI), receive user input selecting, from among the attributes, specific attributes that, when selected, filter the synthesized data to thereby generate a subset of data records, each record in the subset of data records embodying a combination of the selected specific attributes; display, within the UI, a resulting synthesized aggregate count that is representative of a number of records included in the subset of data records; access a set of pre-computed microdata aggregate counts that indicate how many records in the microdata embody specific ones of the attributes or embody specific selected combinations of the attributes; based on the user input, attempt to identify, from the set of pre-computed microdata aggregate counts, a particular count corresponding to the selected specific attributes, the particular count reflecting how many records of the microdata would remain if the same selected specific attributes were used to filter the microdata; and upon a condition in which the particular count is identified, display the particular count simultaneously with the resulting synthesized aggregate count, wherein the particular count is juxtaposed for comparison next to the resulting synthesized aggregate count in the UI to facilitate juxtaposed comparison to determine how closely the resulting synthesized aggregate count matches the particular count. 2. The computer system of claim 1 , wherein, as a part of generating the synthesized data, the ML model ensures that each record in the synthesized data is decoupled from any specific individual entity who is represented within the microdata. 3. The computer system of claim 1 , wherein a parameter is used to control how many times an individual attribute is required to appear in the microdata before being reproduced in the synthesized data. 4. The computer system of claim 1 , wherein the particular count is subjected to a fixed rounding precision requirement. 5. The computer system of claim 1 , wherein a selection limit influences how many of the pre-computed microdata aggregate counts are computed. 6. The computer system of claim 5 , wherein selections of attributes up to the selection limit will dynamically retrieve reportable values from the set of pre-computed microdata aggregate counts while selections of attributes beyond the selection limit will allow further exploration of only the synthetic data. 7. The computer system of claim 6 , wherein selections of attributes beyond the selection limit results in no pre-computed microdata aggregate counts being displayed in the UI. 8. The computer system of claim 1 , wherein the computing system identifies the particular count corresponding to the selected specific attributes from the set of pre-computed microdata aggregate counts. 9. The computer system of claim 8 , wherein a minimum reporting threshold controls whether the particular count is displayed, and wherein, in order to be displayed, a value of the particular count is required to exceed the minimum reporting threshold. 10. The computer system of claim 1 , wherein the UI displays the particular count as a first bar in a bar chart and the resulting synthesized aggregate count as a second bar in the bar chart, and wherein the UI displays a relative percentage correlation of the second bar relative to the first bar. 11. A method for facilitating improved confidence in an accuracy relating to statistics derived from synthetic data generated from microdata, said method comprising: generating synthesized data by anonymizing microdata using a machine learning (ML) model, wherein the ML model generates the synthesized data by: reproducing, within the synthesized data, identified attributes that are identified from within the microdata, and applying a set of constraints that prevent rare combinations of the attributes from being reproduced in the synthesized data, said rare combinations of the attributes being combinations that satisfy a rarity threshold within the microdata; within a user interface (UI), receiving user input selecting, from among the attributes, specific attributes that, when selected, filter the synthesized data to thereby generate a subset of data records, each record in the subset of data records embodying a combination of the selected specific attributes; displaying, within the UI, a resulting synthesized aggregate count that is representative of a number of records included in the subset of data records; accessing a set of pre-computed microdata aggregate counts that indicate how many records in the microdata embody specific ones of the attributes or embody specific selected combinations of the attributes; based on the user input, attempting to identify, from the set of pre-computed microdata aggregate counts, a particular count corresponding to the selected specific attributes, the particular count reflecting how many records of the microdata would remain if the same selected specific attributes were used to filter the microdata; and upon a condition in which the particular count is identified, displaying the particular count simultaneously with the resulting synthesized aggregate count, wherein the particular count is juxtaposed for comparison next to the resulting synthesized aggregate count in the UI to facilitate juxtaposed comparison to determine how closely the resulting synthesized aggregate count matches the particular count. 12. The method of claim 11 , wherein, as a part of generating the synthesized data, the ML model ensures so that each record in the synthesized data is decoupled from any specific individual entity who is represented within the microdata. 13. The method of claim 11 , wherein a parameter is used to control how many times an individual attribute is required to appear in the microdata before being reproduced in the synthesized data. 14. The method of claim 11 , wherein a minimum reporting threshold controls whether the particular count is displayed, and wherein, in order to be displayed, a value of the particular count is required to exceed the minimum reporting threshold. 15. The method of claim 11 , wherein the particular count is subjected to a fixed rounding precision requirement. 16. The method of claim 11 , wherein a selection limit influences how many of the pre-computed microdata aggregate counts are computed. 17. The method of claim 16 , wherein selections of attributes up to the selection limit will dynamically retrieve reportable values from the set of pre-computed microdata aggregate counts while selections of attributes beyond the selection limit will allow further exploration of only the synthetic data. 18. The method of claim 17 , wherein selections

Assignees

Inventors

Classifications

  • Tree-organised classifiers · CPC title

  • Classification techniques · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Machine learning · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11544406B2 cover?
Techniques for synthesizing and analyzing data are disclosed. A ML model anonymizes microdata to generate synthesized data. This anonymizing is performed by reproducing attributes identified within microdata and by applying constraints to prevent rare attribute combinations from being reproduced in the synthesized data. User input selects attributes to filter the synthesized data, thereby gener…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).