Using generative adversarial networks (gans) to enable sharing of sensitive data
US-2022076066-A1 · Mar 10, 2022 · US
US11797705B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11797705-B1 |
| Application number | US-201916711260-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 11, 2019 |
| Priority date | Dec 11, 2019 |
| Publication date | Oct 24, 2023 |
| Grant date | Oct 24, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A generative adversarial network (GAN) may be implemented to recognize named entity types in detection of sensitive information in datasets. The GAN may include a generator and a discriminator. The generator may be trained to produce synthetic data to include information that simulates named entity types representing the sensitive information. The discriminator may be fed with real data that are known to include the sensitive information (as positive examples), together with the synthetic data that simulate the sensitive information (as negative examples), to train to classify the real vs. synthetic data. In field operations, the discriminator may be deployed to perform named entity type recognition to identify data having the sensitive information. The generator may be deployed to provide anonymous data in lieu of real data to facilitate sensitive information sharing and disclosure.
Opening claim text (preview).
What is claimed is: 1 . A system, comprising: a generator of a generative adversarial network (GAN) implemented by one or more processors and memory and configured to: be trained to generate one or more synthetic data based on training sets to the generator, the one or more synthetic data including information that simulates sensitive information; and a discriminator of the GAN implemented by the one or more processors and memory and configured to: be trained to classify each one of training sets to the discriminator that include the one or more synthetic data generated by the generator and one or more positive examples as synthetic data or a positive example, the one or more positive examples each including the sensitive information; and be deployed to identify that a data store includes at least some data matching a named entity type based on samples of data of the data store, wherein the samples are sampled according to a distribution of the data within the data store, the named entity type representing the sensitive information. 2 . The system of claim 1 , wherein the generator is further configured to: be deployed to generate anonymous data to replace data that include one or more sensitive information. 3 . The system of claim 1 , wherein the discriminator is implemented based on a bidirectional long short-term memory (LSTM) network. 4 . The system of claim 1 , wherein the generator is implemented based on a unidirectional LSTM network. 5 . The system of claim 1 , wherein the generator and the discriminator are implemented as part of a data store classification service offered by a provider network, and wherein data of the data store is received via a network-based interface for the data store classification service and stored in the data store that is implemented as part of a data storage service offered by the provider network. 6 . A method, comprising: performing, by one or more computers: training a generator, implemented by one or more processors and memory, to generate one or more synthetic data based on training sets to the generator, the one or more synthetic data including information that simulates sensitive information; training a discriminator, implemented by the one or more processors and memory, to classify each one of training sets to the discriminator that include the one or more synthetic data generated by the generator and one or more positive examples as synthetic data or a positive example, the one or more positive examples each including the sensitive information; and deploying the discriminator to identify that a data store includes at least some data matching a named entity type based on samples of data of the data store, wherein the samples are sampled according to a distribution of the data within the data store, the named entity type representing the sensitive information. 7 . The method of claim 6 , wherein said training the discriminator to classify each one of the training sets as synthetic data or a positive example comprises: providing, by the discriminator, one or more losses, according to respective loss functions, to the discriminator and the generator based on the classification of the each one of the training sets to the discriminator. 8 . The method of claim 7 , wherein said providing the one or more losses comprises: responsive to classifying a training set including a positive example as a positive example or classifying a training set including one synthetic data as synthetic data, providing a low loss to the discriminator; and responsive to classifying the training set including the positive example as synthetic data or classifying the training set including the one synthetic data as synthetic data, providing a high loss to the discriminator. 9 . The method of claim 7 , wherein said providing the one or more losses comprises: responsive to classifying a training set including one synthetic data as a positive example, providing a low loss to the generator; and responsive to classifying the training set including the one synthetic data as synthetic data, providing a high loss to the generator. 10 . The method of claim 6 , further comprising: deploying the generator to generate anonymous data to replace data that include one or more sensitive information. 11 . The method of claim 6 , wherein the discriminator is implemented based on a bidirectional LSTM network. 12 . The method of claim 6 , wherein the generator is implemented based on a unidirectional LSTM network. 13 . The method of claim 6 , wherein said training the generator and said training and deploying the discriminator are implemented by a data store classification service offered by a provider network, and wherein data of the data store is received via a network-based interface for the data store classification service and stored in the data store that is implemented as part of a data storage service offered by the provider network. 14 . One or more non-transitory computer readable media comprising instructions which, when executed on or across one or more processors, cause the one or more processors to: access, using a discriminator, data of a data store; and identify, using the discriminator, that the data store includes at least some data matching a named entity type that represents sensitive information according to a confidence value indicating a probability based on samples of data of the data store, wherein the samples of the data are sampled according to a distribution of the data within the data store, wherein the discriminator is trained to classify each one of training sets to the discriminator that include one or more synthetic data and one or more positive examples as synthetic data or a positive example, the one or more positive examples each including the sensitive information, and the one or more synthetic data including information that simulates the sensitive information and generated by a generator based on training sets to the generator. 15 . The one or more non-transitory computer readable media of claim 14 comprising instructions which, when executed on or across the one or more processors, cause the one or more processors to: responsive to classifying a training set including a positive example as a positive example or classifying a training set including one synthetic data as synthetic data, provide a low loss to the discriminator; and responsive to classifying the training set including the positive example as synthetic data or classifying the training set including the one synthetic data as a positive example, provide a high loss to the discriminator. 16 . The one or more non-transitory computer readable media of claim 14 comprising instructions which, when executed on or across the one or more processors, cause the one or more processors to: responsive to classifying a training set including one synthetic data as a positive example, provide a low loss to the generator; and responsive to classifying the training set including the one synthetic data as synthetic data, provide a high loss to the generator. 17 . The one or more non-transitory computer readable media of claim 15 comprising instructions which, when executed on or across the one or more processors, cause the one or more processors to: generate, using the generator, anonymous data to replace data that include one or more sensitive information. 18 . The one or more non-transitory computer readable media of claim 14 comprising instructions which, when executed on or across the one or more processors,
Protecting personal data, e.g. for financial or medical purposes · CPC title
Classification techniques · CPC title
Named entity recognition · CPC title
Combinations of networks · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.