Electronic health record data synthesization

US12014293B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12014293-B2
Application numberUS-202016941561-A
CountryUS
Kind codeB2
Filing dateJul 29, 2020
Priority dateJul 29, 2020
Publication dateJun 18, 2024
Grant dateJun 18, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to a method, system and computer program product for electronic health record (EHR) data synthetization. According to the method, an original EHR dataset X is captured. A latent space Z is generated from the original EHR dataset X, wherein dimensionality of Z is lower than that of X. A stochastic process prior module is applied to the latent space Z. Synthetic EHR dataset X′ is reconstructed from the latent space Z after being applied with the stochastic process prior.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for electronic health record (EHR) data synthetization, comprising: capturing, by one or more processing units, an original EHR dataset; generating, by one or more processing units in a variational autoencoder module, a latent space from the original EHR dataset, wherein; dimensionality of the latent space is lower than that of the original EHR dataset; and the latent space comprises discrete values; applying, by one or more processing units, a stochastic process prior to the latent space, wherein applying the stochastic process prior further comprises: receiving, by one or more processing units, a state space and a transition rate matrix; and applying, by one or more processing units, Markov process prior as the stochastic process prior to the latent space based on the state space and the transition rate matrix for the discrete values to consider characteristics of a time series of the original EHR dataset; reconstructing, by one or more processing units in the variational autoencoder module, a synthetic EHR dataset from the latent space in response to being applied with the stochastic process prior; comparing, by one or more processing units, the synthetic EHR dataset to the original EHR dataset; and backpropagating, by one or more processing units, error through the variational autoencoder module to update weights to train the variational autoencoder module. 2. The method of claim 1 , wherein the latent space further comprises continuous values that are separated by shorter amounts of time than the discrete values, and the applying stochastic process prior further comprises: receiving, by one or more processing units, a covariance and a mean; and applying, by one or more processing units, a Gaussian process prior as the stochastic process prior to the latent space based on the covariance and the mean. 3. The method of claim 1 , further comprising: updating, by one or more processing units, the latent space with differential privacy noise. 4. The method of claim 3 , wherein the latent space comprises discrete values, and the updating latent space further comprises: receiving, by one or more processing units, a single privacy budget; and updating, by one or more processing units, the latent space by an exponential mechanism based on the single privacy budget. 5. The method of claim 3 , wherein the latent space comprises continuous values, and the updating latent space further comprises: receiving, by one or more processing units, a single privacy budget; and updating, by one or more processing units, the latent space by a Laplace mechanism based on the single privacy budget. 6. The method of claim 1 , wherein reconstructing the synthetic EHR dataset from the latent space comprises multiple iterations of reconstructing such that the synthetic EHR dataset is larger than the original EHR dataset. 7. A computer program product for electronic health record (EHR) data synthetization, comprising: a computer readable storage medium having program instructions embodied therewith, the program instructions being executable by a computer to cause the computer to perform a method comprising: capturing an original EHR dataset; generating, by a variational autoencoder module, a latent space from the original EHR dataset, wherein: dimensionality of the latent space is lower than that of the original EHR dataset; and the latent space comprises discrete values; applying a stochastic process prior to the latent space, wherein applying the stochastic process prior further comprises: receiving a state space and a transition rate matrix; and applying Markov process prior as the stochastic process prior to the latent space based on the state space and the transition rate matrix for the discrete values to consider characteristics of a time series of the original EHR dataset; reconstructing, by the variational autoencoder module, a synthetic EHR dataset from the latent space after being applied with the stochastic process prior; comparing, by one or more processing units, the synthetic EHR dataset to the original EHR dataset; and backpropagating, by one or more processing units, error through the variational autoencoder module to update weights to train the variational autoencoder module. 8. The computer program product of claim 7 , wherein the latent space further comprises continuous values that are separated by shorter amounts of time than the discrete values, and the applying stochastic process prior further comprises: receiving a covariance and a mean; and applying a Gaussian process prior as the stochastic process prior to the latent space based on the covariance and the mean. 9. The computer program product of claim 7 , the method further comprising: updating the latent space with differential privacy noise. 10. The computer program product of claim 9 , wherein the latent space comprises discrete values, and the updating latent space further comprises: receiving a single privacy budget; and updating the latent space by an exponential mechanism based on the received single privacy budget. 11. The computer program product of claim 9 , wherein the latent space comprises continuous values, and the updating latent space further comprises: receiving a single privacy budget; and updating the latent space by a Laplace mechanism based on the received single privacy budget. 12. The computer program product of claim 7 , wherein reconstructing the synthetic EHR dataset from the latent space comprises multiple iterations of reconstructing such that the synthetic EHR dataset is larger than the original EHR dataset. 13. A computer system for electronic health record (EHR) data synthetization, comprising: one or more processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors to perform a method comprising: capturing an original EHR dataset; generating, by a variational autoencoder module, a latent space from the original EHR dataset, wherein: dimensionality of the latent space is lower than that of the original EHR dataset; and the latent space comprises discrete values; applying a stochastic process prior to the latent space, wherein applying the stochastic process prior further comprises: receiving state space and a transition rate matrix; and applying Markov process prior as the stochastic process prior to the latent space based on the state space and the transition rate matrix for the discrete values to consider characteristics of a time series of the original EHR dataset; reconstructing, by the variational autoencoder module, a synthetic EHR dataset from the latent space after being applied with the stochastic process prior; comparing, by one or more processing units, the synthetic EHR dataset to the original EHR dataset; and backpropagating, by one or more processing units, error through the variational autoencoder module to update weights to train the variational autoencoder module. 14. The computer system of claim 13 , wherein the latent space further comprises continuous values that are separated by shorter amounts of time than the discrete values, and the applying stochastic process prior further comprises: receiving a covariance and a mean; and applying a Gaussian process prior as the stochastic process prior to the latent space based on the covariance and the mean. 15. The computer system of claim 13 , the method further comprising: updating the latent space with differential privacy n

Assignees

Inventors

Classifications

  • Generative networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12014293B2 cover?
The present disclosure relates to a method, system and computer program product for electronic health record (EHR) data synthetization. According to the method, an original EHR dataset X is captured. A latent space Z is generated from the original EHR dataset X, wherein dimensionality of Z is lower than that of X. A stochastic process prior module is applied to the latent space Z. Synthetic EHR…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).