Data-driven techniques for model ensembles
US-2021342707-A1 · Nov 4, 2021 · US
US11544406B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11544406-B2 |
| Application number | US-202016869170-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 7, 2020 |
| Priority date | Feb 7, 2020 |
| Publication date | Jan 3, 2023 |
| Grant date | Jan 3, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for synthesizing and analyzing data are disclosed. A ML model anonymizes microdata to generate synthesized data. This anonymizing is performed by reproducing attributes identified within microdata and by applying constraints to prevent rare attribute combinations from being reproduced in the synthesized data. User input selects attributes to filter the synthesized data, thereby generating a subset of records. A UI displays a synthesized aggregate count representing how many records are in the subset. Pre-computed aggregate counts are accessed to indicate how many records in the microdata embody certain attributes. Based on the user input, there is an attempt to identify a particular count from the pre-computed aggregate counts. This count reflects how many records of the microdata would remain if the selected attributes were used to filter the microdata. That count is displayed along with the synthesized aggregate count. The two counts are juxtaposed next to one another.
Opening claim text (preview).
What is claimed is: 1. A computer system configured to facilitate improved confidence in an accuracy relating to statistics derived from synthetic data generated from microdata, said computer system comprising: one or more processors; and one or more computer-readable hardware storage devices that store computer-executable instructions that configure the computer system to at least: generate synthesized data by anonymizing microdata using a machine learning (ML) model, wherein the ML model generates the synthesized data by: reproducing, within the synthesized data, identified attributes that are identified from within the microdata, and applying a set of constraints that prevent rare combinations of the attributes from being reproduced in the synthesized data, said rare combinations of the attributes being combinations that satisfy a rarity threshold within the microdata; within a user interface (UI), receive user input selecting, from among the attributes, specific attributes that, when selected, filter the synthesized data to thereby generate a subset of data records, each record in the subset of data records embodying a combination of the selected specific attributes; display, within the UI, a resulting synthesized aggregate count that is representative of a number of records included in the subset of data records; access a set of pre-computed microdata aggregate counts that indicate how many records in the microdata embody specific ones of the attributes or embody specific selected combinations of the attributes; based on the user input, attempt to identify, from the set of pre-computed microdata aggregate counts, a particular count corresponding to the selected specific attributes, the particular count reflecting how many records of the microdata would remain if the same selected specific attributes were used to filter the microdata; and upon a condition in which the particular count is identified, display the particular count simultaneously with the resulting synthesized aggregate count, wherein the particular count is juxtaposed for comparison next to the resulting synthesized aggregate count in the UI to facilitate juxtaposed comparison to determine how closely the resulting synthesized aggregate count matches the particular count. 2. The computer system of claim 1 , wherein, as a part of generating the synthesized data, the ML model ensures that each record in the synthesized data is decoupled from any specific individual entity who is represented within the microdata. 3. The computer system of claim 1 , wherein a parameter is used to control how many times an individual attribute is required to appear in the microdata before being reproduced in the synthesized data. 4. The computer system of claim 1 , wherein the particular count is subjected to a fixed rounding precision requirement. 5. The computer system of claim 1 , wherein a selection limit influences how many of the pre-computed microdata aggregate counts are computed. 6. The computer system of claim 5 , wherein selections of attributes up to the selection limit will dynamically retrieve reportable values from the set of pre-computed microdata aggregate counts while selections of attributes beyond the selection limit will allow further exploration of only the synthetic data. 7. The computer system of claim 6 , wherein selections of attributes beyond the selection limit results in no pre-computed microdata aggregate counts being displayed in the UI. 8. The computer system of claim 1 , wherein the computing system identifies the particular count corresponding to the selected specific attributes from the set of pre-computed microdata aggregate counts. 9. The computer system of claim 8 , wherein a minimum reporting threshold controls whether the particular count is displayed, and wherein, in order to be displayed, a value of the particular count is required to exceed the minimum reporting threshold. 10. The computer system of claim 1 , wherein the UI displays the particular count as a first bar in a bar chart and the resulting synthesized aggregate count as a second bar in the bar chart, and wherein the UI displays a relative percentage correlation of the second bar relative to the first bar. 11. A method for facilitating improved confidence in an accuracy relating to statistics derived from synthetic data generated from microdata, said method comprising: generating synthesized data by anonymizing microdata using a machine learning (ML) model, wherein the ML model generates the synthesized data by: reproducing, within the synthesized data, identified attributes that are identified from within the microdata, and applying a set of constraints that prevent rare combinations of the attributes from being reproduced in the synthesized data, said rare combinations of the attributes being combinations that satisfy a rarity threshold within the microdata; within a user interface (UI), receiving user input selecting, from among the attributes, specific attributes that, when selected, filter the synthesized data to thereby generate a subset of data records, each record in the subset of data records embodying a combination of the selected specific attributes; displaying, within the UI, a resulting synthesized aggregate count that is representative of a number of records included in the subset of data records; accessing a set of pre-computed microdata aggregate counts that indicate how many records in the microdata embody specific ones of the attributes or embody specific selected combinations of the attributes; based on the user input, attempting to identify, from the set of pre-computed microdata aggregate counts, a particular count corresponding to the selected specific attributes, the particular count reflecting how many records of the microdata would remain if the same selected specific attributes were used to filter the microdata; and upon a condition in which the particular count is identified, displaying the particular count simultaneously with the resulting synthesized aggregate count, wherein the particular count is juxtaposed for comparison next to the resulting synthesized aggregate count in the UI to facilitate juxtaposed comparison to determine how closely the resulting synthesized aggregate count matches the particular count. 12. The method of claim 11 , wherein, as a part of generating the synthesized data, the ML model ensures so that each record in the synthesized data is decoupled from any specific individual entity who is represented within the microdata. 13. The method of claim 11 , wherein a parameter is used to control how many times an individual attribute is required to appear in the microdata before being reproduced in the synthesized data. 14. The method of claim 11 , wherein a minimum reporting threshold controls whether the particular count is displayed, and wherein, in order to be displayed, a value of the particular count is required to exceed the minimum reporting threshold. 15. The method of claim 11 , wherein the particular count is subjected to a fixed rounding precision requirement. 16. The method of claim 11 , wherein a selection limit influences how many of the pre-computed microdata aggregate counts are computed. 17. The method of claim 16 , wherein selections of attributes up to the selection limit will dynamically retrieve reportable values from the set of pre-computed microdata aggregate counts while selections of attributes beyond the selection limit will allow further exploration of only the synthetic data. 18. The method of claim 17 , wherein selections
Tree-organised classifiers · CPC title
Classification techniques · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Machine learning · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.