Methods and apparatus for transforming and statistically modeling relational databases to synthesize privacy-protected anonymized data
US-2018165475-A1 · Jun 14, 2018 · US
US10664381B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10664381-B2 |
| Application number | US-201916454041-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 26, 2019 |
| Priority date | Jul 6, 2018 |
| Publication date | May 26, 2020 |
| Grant date | May 26, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for generating synthetic data are disclosed. For example, a system may include one or more memory units storing instructions and one or more processors configured to execute the instructions to perform operations. The operations may include receiving a dataset that includes time series data having a plurality of dimensions and generating a transformed dataset by performing a first data transformation. The first data transformation may include a time-based data processing method. The operations may include generating a synthetic transformed-dataset by implementing a data model using the transformed dataset. The data model may be configured to generate synthetic transformed-data based on a relationship between data of at least two dimensions of the transformed dataset. The operations may include generating a synthetic dataset by performing a second data transformation on the synthetic transformed-dataset. The second data transformation may include an inverse of the first data transformation.
Opening claim text (preview).
What is claimed is: 1. A system for generating synthetic data, comprising: one or more memory units storing instructions; and one or more processors that execute the instructions to perform operations comprising: receiving a dataset comprising time series data having a plurality of dimensions; generating a transformed dataset by performing a first data transformation on the dataset, the first data transformation comprising a time-based data processing operation; generating a first synthetic transformed dataset by implementing a data model using the transformed dataset, the data model being configured to generate synthetic transformed data based on a relationship between data of at least two dimensions of the transformed dataset; generating a second synthetic transformed dataset by performing a second data transformation on the first synthetic transformed dataset, the second data transformation comprising an inverse of the first data transformation; receiving a plurality of sample datasets having a plurality of respective dimensions; generating a plurality of transformed sample datasets corresponding to the sample datasets by performing the first data transformation on the sample datasets; and training the data model to generate synthetic transformed data based on the transformed sample datasets. 2. The system of claim 1 , wherein: the operations further comprise generating the data model; and training the data model is based on generating the data model. 3. The system of claim 1 , wherein the first data transformation comprises encoding the dataset. 4. The system of claim 3 , wherein encoding the dataset comprises encoding a character as a number. 5. The system of claim 3 , wherein encoding the dataset comprises implementing a natural language model to encode string data as numeric data. 6. The system of claim 3 , wherein encoding the dataset comprises implementing an encoder model to reduce the number of dimensions of the dataset. 7. The system of claim 1 , wherein the data model comprises a recurrent neural network-convolutional neural network (RNN-CNN) model. 8. The system of claim 1 , wherein the data model comprises a Long Short Term Memory Convolutional Neural Network model. 9. The system of claim 1 , wherein the data model comprises an attention network model. 10. The system of claim 1 , wherein the relationship comprises a correlation between at least two dimensions of the dataset. 11. The system of claim 1 , wherein the first data transformation comprises vector subtraction. 12. The system of claim 1 , wherein the first data transformation comprises normalization. 13. The system of claim 1 , wherein the first data transformation comprises applying a logarithmic function. 14. The system of claim 1 , wherein the first data transformation comprises implementing a pooling operation. 15. The system of claim 1 , wherein the data model is configured to generate synthetic transformed data by at least assigning a probability to the synthetic transformed-data. 16. The system of claim 1 , wherein: receiving the dataset comprises receiving the dataset from a client device; and the operations further comprise transmitting the second synthetic transformed dataset to the client device. 17. The system of claim 1 , wherein receiving the dataset comprises receiving the dataset at a cloud service. 18. A method for generating synthetic data, the method comprising: receiving a dataset comprising time series data having a plurality of dimensions; generating a transformed dataset by performing a first data transformation on the dataset, the first data transformation comprising a time-based data processing operation; generating a first synthetic transformed-dataset by implementing a data model using the transformed dataset, the data model being configured to generate synthetic transformed-data based on a relationship between data of at least two dimensions of the transformed dataset; and generating a second synthetic transformed dataset by performing a second data transformation on the first synthetic transformed-dataset, the second data transformation comprising an inverse of the first data transformation; receiving a plurality of sample datasets having a plurality of respective dimensions; generating a plurality of transformed sample-datasets corresponding to the sample datasets by performing the first data transformation on the sample datasets; and training the data model to generate synthetic transformed-data based on the transformed sample-datasets. 19. A system for generating synthetic data, comprising: one or more memory units storing instructions; and one or more processors that execute the instructions to perform operations comprising: receiving, at a server, from a client device, a dataset comprising numeric time-series data having a plurality of dimensions; generating a transformed dataset by performing a first data transformation on the dataset, the first data transformation comprising subtracting data associated with a first time point from data associated with a second time point; generating a synthetic transformed-dataset by implementing a data model using the transformed dataset, the data model comprising an RNN-CNN model configured to generate synthetic transformed-data based on a relationship between data of at least two dimensions of the transformed dataset; generating a synthetic dataset by performing a second data transformation on the synthetic transformed-dataset, the second data transformation comprising an inverse of the first data transformation; and transmitting, to the client device, the synthetic dataset.
Ensemble learning · CPC title
for test design, e.g. generating new test cases · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
for test execution, e.g. scheduling of test suites · CPC title
Correlation function computation {including computation of convolution operations (arithmetic circuits for sum of products per se, e.g. multiply-accumulators G06F7/5443; digital filters, e.g. FIR, IIR, adaptive filters H03H17/00)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.