Applying a differential privacy operation on a cluster of data
US-2019087604-A1 · Mar 21, 2019 · US
US12014293B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12014293-B2 |
| Application number | US-202016941561-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 29, 2020 |
| Priority date | Jul 29, 2020 |
| Publication date | Jun 18, 2024 |
| Grant date | Jun 18, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to a method, system and computer program product for electronic health record (EHR) data synthetization. According to the method, an original EHR dataset X is captured. A latent space Z is generated from the original EHR dataset X, wherein dimensionality of Z is lower than that of X. A stochastic process prior module is applied to the latent space Z. Synthetic EHR dataset X′ is reconstructed from the latent space Z after being applied with the stochastic process prior.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for electronic health record (EHR) data synthetization, comprising: capturing, by one or more processing units, an original EHR dataset; generating, by one or more processing units in a variational autoencoder module, a latent space from the original EHR dataset, wherein; dimensionality of the latent space is lower than that of the original EHR dataset; and the latent space comprises discrete values; applying, by one or more processing units, a stochastic process prior to the latent space, wherein applying the stochastic process prior further comprises: receiving, by one or more processing units, a state space and a transition rate matrix; and applying, by one or more processing units, Markov process prior as the stochastic process prior to the latent space based on the state space and the transition rate matrix for the discrete values to consider characteristics of a time series of the original EHR dataset; reconstructing, by one or more processing units in the variational autoencoder module, a synthetic EHR dataset from the latent space in response to being applied with the stochastic process prior; comparing, by one or more processing units, the synthetic EHR dataset to the original EHR dataset; and backpropagating, by one or more processing units, error through the variational autoencoder module to update weights to train the variational autoencoder module. 2. The method of claim 1 , wherein the latent space further comprises continuous values that are separated by shorter amounts of time than the discrete values, and the applying stochastic process prior further comprises: receiving, by one or more processing units, a covariance and a mean; and applying, by one or more processing units, a Gaussian process prior as the stochastic process prior to the latent space based on the covariance and the mean. 3. The method of claim 1 , further comprising: updating, by one or more processing units, the latent space with differential privacy noise. 4. The method of claim 3 , wherein the latent space comprises discrete values, and the updating latent space further comprises: receiving, by one or more processing units, a single privacy budget; and updating, by one or more processing units, the latent space by an exponential mechanism based on the single privacy budget. 5. The method of claim 3 , wherein the latent space comprises continuous values, and the updating latent space further comprises: receiving, by one or more processing units, a single privacy budget; and updating, by one or more processing units, the latent space by a Laplace mechanism based on the single privacy budget. 6. The method of claim 1 , wherein reconstructing the synthetic EHR dataset from the latent space comprises multiple iterations of reconstructing such that the synthetic EHR dataset is larger than the original EHR dataset. 7. A computer program product for electronic health record (EHR) data synthetization, comprising: a computer readable storage medium having program instructions embodied therewith, the program instructions being executable by a computer to cause the computer to perform a method comprising: capturing an original EHR dataset; generating, by a variational autoencoder module, a latent space from the original EHR dataset, wherein: dimensionality of the latent space is lower than that of the original EHR dataset; and the latent space comprises discrete values; applying a stochastic process prior to the latent space, wherein applying the stochastic process prior further comprises: receiving a state space and a transition rate matrix; and applying Markov process prior as the stochastic process prior to the latent space based on the state space and the transition rate matrix for the discrete values to consider characteristics of a time series of the original EHR dataset; reconstructing, by the variational autoencoder module, a synthetic EHR dataset from the latent space after being applied with the stochastic process prior; comparing, by one or more processing units, the synthetic EHR dataset to the original EHR dataset; and backpropagating, by one or more processing units, error through the variational autoencoder module to update weights to train the variational autoencoder module. 8. The computer program product of claim 7 , wherein the latent space further comprises continuous values that are separated by shorter amounts of time than the discrete values, and the applying stochastic process prior further comprises: receiving a covariance and a mean; and applying a Gaussian process prior as the stochastic process prior to the latent space based on the covariance and the mean. 9. The computer program product of claim 7 , the method further comprising: updating the latent space with differential privacy noise. 10. The computer program product of claim 9 , wherein the latent space comprises discrete values, and the updating latent space further comprises: receiving a single privacy budget; and updating the latent space by an exponential mechanism based on the received single privacy budget. 11. The computer program product of claim 9 , wherein the latent space comprises continuous values, and the updating latent space further comprises: receiving a single privacy budget; and updating the latent space by a Laplace mechanism based on the received single privacy budget. 12. The computer program product of claim 7 , wherein reconstructing the synthetic EHR dataset from the latent space comprises multiple iterations of reconstructing such that the synthetic EHR dataset is larger than the original EHR dataset. 13. A computer system for electronic health record (EHR) data synthetization, comprising: one or more processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors to perform a method comprising: capturing an original EHR dataset; generating, by a variational autoencoder module, a latent space from the original EHR dataset, wherein: dimensionality of the latent space is lower than that of the original EHR dataset; and the latent space comprises discrete values; applying a stochastic process prior to the latent space, wherein applying the stochastic process prior further comprises: receiving state space and a transition rate matrix; and applying Markov process prior as the stochastic process prior to the latent space based on the state space and the transition rate matrix for the discrete values to consider characteristics of a time series of the original EHR dataset; reconstructing, by the variational autoencoder module, a synthetic EHR dataset from the latent space after being applied with the stochastic process prior; comparing, by one or more processing units, the synthetic EHR dataset to the original EHR dataset; and backpropagating, by one or more processing units, error through the variational autoencoder module to update weights to train the variational autoencoder module. 14. The computer system of claim 13 , wherein the latent space further comprises continuous values that are separated by shorter amounts of time than the discrete values, and the applying stochastic process prior further comprises: receiving a covariance and a mean; and applying a Gaussian process prior as the stochastic process prior to the latent space based on the covariance and the mean. 15. The computer system of claim 13 , the method further comprising: updating the latent space with differential privacy n
Generative networks · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Probabilistic or stochastic networks · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Protecting personal data, e.g. for financial or medical purposes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.