Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F21/6254. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Privacy-preserving data platform

US11544406B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11544406-B2
Application number	US-202016869170-A
Country	US
Kind code	B2
Filing date	May 7, 2020
Priority date	Feb 7, 2020
Publication date	Jan 3, 2023
Grant date	Jan 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for synthesizing and analyzing data are disclosed. A ML model anonymizes microdata to generate synthesized data. This anonymizing is performed by reproducing attributes identified within microdata and by applying constraints to prevent rare attribute combinations from being reproduced in the synthesized data. User input selects attributes to filter the synthesized data, thereby generating a subset of records. A UI displays a synthesized aggregate count representing how many records are in the subset. Pre-computed aggregate counts are accessed to indicate how many records in the microdata embody certain attributes. Based on the user input, there is an attempt to identify a particular count from the pre-computed aggregate counts. This count reflects how many records of the microdata would remain if the selected attributes were used to filter the microdata. That count is displayed along with the synthesized aggregate count. The two counts are juxtaposed next to one another.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system configured to facilitate improved confidence in an accuracy relating to statistics derived from synthetic data generated from microdata, said computer system comprising: one or more processors; and one or more computer-readable hardware storage devices that store computer-executable instructions that configure the computer system to at least: generate synthesized data by anonymizing microdata using a machine learning (ML) model, wherein the ML model generates the synthesized data by: reproducing, within the synthesized data, identified attributes that are identified from within the microdata, and applying a set of constraints that prevent rare combinations of the attributes from being reproduced in the synthesized data, said rare combinations of the attributes being combinations that satisfy a rarity threshold within the microdata; within a user interface (UI), receive user input selecting, from among the attributes, specific attributes that, when selected, filter the synthesized data to thereby generate a subset of data records, each record in the subset of data records embodying a combination of the selected specific attributes; display, within the UI, a resulting synthesized aggregate count that is representative of a number of records included in the subset of data records; access a set of pre-computed microdata aggregate counts that indicate how many records in the microdata embody specific ones of the attributes or embody specific selected combinations of the attributes; based on the user input, attempt to identify, from the set of pre-computed microdata aggregate counts, a particular count corresponding to the selected specific attributes, the particular count reflecting how many records of the microdata would remain if the same selected specific attributes were used to filter the microdata; and upon a condition in which the particular count is identified, display the particular count simultaneously with the resulting synthesized aggregate count, wherein the particular count is juxtaposed for comparison next to the resulting synthesized aggregate count in the UI to facilitate juxtaposed comparison to determine how closely the resulting synthesized aggregate count matches the particular count. 2. The computer system of claim 1 , wherein, as a part of generating the synthesized data, the ML model ensures that each record in the synthesized data is decoupled from any specific individual entity who is represented within the microdata. 3. The computer system of claim 1 , wherein a parameter is used to control how many times an individual attribute is required to appear in the microdata before being reproduced in the synthesized data. 4. The computer system of claim 1 , wherein the particular count is subjected to a fixed rounding precision requirement. 5. The computer system of claim 1 , wherein a selection limit influences how many of the pre-computed microdata aggregate counts are computed. 6. The computer system of claim 5 , wherein selections of attributes up to the selection limit will dynamically retrieve reportable values from the set of pre-computed microdata aggregate counts while selections of attributes beyond the selection limit will allow further exploration of only the synthetic data. 7. The computer system of claim 6 , wherein selections of attributes beyond the selection limit results in no pre-computed microdata aggregate counts being displayed in the UI. 8. The computer system of claim 1 , wherein the computing system identifies the particular count corresponding to the selected specific attributes from the set of pre-computed microdata aggregate counts. 9. The computer system of claim 8 , wherein a minimum reporting threshold controls whether the particular count is displayed, and wherein, in order to be displayed, a value of the particular count is required to exceed the minimum reporting threshold. 10. The computer system of claim 1 , wherein the UI displays the particular count as a first bar in a bar chart and the resulting synthesized aggregate count as a second bar in the bar chart, and wherein the UI displays a relative percentage correlation of the second bar relative to the first bar. 11. A method for facilitating improved confidence in an accuracy relating to statistics derived from synthetic data generated from microdata, said method comprising: generating synthesized data by anonymizing microdata using a machine learning (ML) model, wherein the ML model generates the synthesized data by: reproducing, within the synthesized data, identified attributes that are identified from within the microdata, and applying a set of constraints that prevent rare combinations of the attributes from being reproduced in the synthesized data, said rare combinations of the attributes being combinations that satisfy a rarity threshold within the microdata; within a user interface (UI), receiving user input selecting, from among the attributes, specific attributes that, when selected, filter the synthesized data to thereby generate a subset of data records, each record in the subset of data records embodying a combination of the selected specific attributes; displaying, within the UI, a resulting synthesized aggregate count that is representative of a number of records included in the subset of data records; accessing a set of pre-computed microdata aggregate counts that indicate how many records in the microdata embody specific ones of the attributes or embody specific selected combinations of the attributes; based on the user input, attempting to identify, from the set of pre-computed microdata aggregate counts, a particular count corresponding to the selected specific attributes, the particular count reflecting how many records of the microdata would remain if the same selected specific attributes were used to filter the microdata; and upon a condition in which the particular count is identified, displaying the particular count simultaneously with the resulting synthesized aggregate count, wherein the particular count is juxtaposed for comparison next to the resulting synthesized aggregate count in the UI to facilitate juxtaposed comparison to determine how closely the resulting synthesized aggregate count matches the particular count. 12. The method of claim 11 , wherein, as a part of generating the synthesized data, the ML model ensures so that each record in the synthesized data is decoupled from any specific individual entity who is represented within the microdata. 13. The method of claim 11 , wherein a parameter is used to control how many times an individual attribute is required to appear in the microdata before being reproduced in the synthesized data. 14. The method of claim 11 , wherein a minimum reporting threshold controls whether the particular count is displayed, and wherein, in order to be displayed, a value of the particular count is required to exceed the minimum reporting threshold. 15. The method of claim 11 , wherein the particular count is subjected to a fixed rounding precision requirement. 16. The method of claim 11 , wherein a selection limit influences how many of the pre-computed microdata aggregate counts are computed. 17. The method of claim 16 , wherein selections of attributes up to the selection limit will dynamically retrieve reportable values from the set of pre-computed microdata aggregate counts while selections of attributes beyond the selection limit will allow further exploration of only the synthetic data. 18. The method of claim 17 , wherein selections

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F18/24323
Tree-organised classifiers · CPC title
G06F18/24
Classification techniques · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06N20/00
Machine learning · CPC title
G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

View patent family 77177611

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11544406B2 cover?: Techniques for synthesizing and analyzing data are disclosed. A ML model anonymizes microdata to generate synthesized data. This anonymizing is performed by reproducing attributes identified within microdata and by applying constraints to prevent rare attribute combinations from being reproduced in the synthesized data. User input selects attributes to filter the synthesized data, thereby gener…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F21/6254. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).