Generative adversarial network for named entity recognition

US11797705B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11797705-B1
Application numberUS-201916711260-A
CountryUS
Kind codeB1
Filing dateDec 11, 2019
Priority dateDec 11, 2019
Publication dateOct 24, 2023
Grant dateOct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A generative adversarial network (GAN) may be implemented to recognize named entity types in detection of sensitive information in datasets. The GAN may include a generator and a discriminator. The generator may be trained to produce synthetic data to include information that simulates named entity types representing the sensitive information. The discriminator may be fed with real data that are known to include the sensitive information (as positive examples), together with the synthetic data that simulate the sensitive information (as negative examples), to train to classify the real vs. synthetic data. In field operations, the discriminator may be deployed to perform named entity type recognition to identify data having the sensitive information. The generator may be deployed to provide anonymous data in lieu of real data to facilitate sensitive information sharing and disclosure.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system, comprising: a generator of a generative adversarial network (GAN) implemented by one or more processors and memory and configured to: be trained to generate one or more synthetic data based on training sets to the generator, the one or more synthetic data including information that simulates sensitive information; and a discriminator of the GAN implemented by the one or more processors and memory and configured to: be trained to classify each one of training sets to the discriminator that include the one or more synthetic data generated by the generator and one or more positive examples as synthetic data or a positive example, the one or more positive examples each including the sensitive information; and be deployed to identify that a data store includes at least some data matching a named entity type based on samples of data of the data store, wherein the samples are sampled according to a distribution of the data within the data store, the named entity type representing the sensitive information. 2 . The system of claim 1 , wherein the generator is further configured to: be deployed to generate anonymous data to replace data that include one or more sensitive information. 3 . The system of claim 1 , wherein the discriminator is implemented based on a bidirectional long short-term memory (LSTM) network. 4 . The system of claim 1 , wherein the generator is implemented based on a unidirectional LSTM network. 5 . The system of claim 1 , wherein the generator and the discriminator are implemented as part of a data store classification service offered by a provider network, and wherein data of the data store is received via a network-based interface for the data store classification service and stored in the data store that is implemented as part of a data storage service offered by the provider network. 6 . A method, comprising: performing, by one or more computers: training a generator, implemented by one or more processors and memory, to generate one or more synthetic data based on training sets to the generator, the one or more synthetic data including information that simulates sensitive information; training a discriminator, implemented by the one or more processors and memory, to classify each one of training sets to the discriminator that include the one or more synthetic data generated by the generator and one or more positive examples as synthetic data or a positive example, the one or more positive examples each including the sensitive information; and deploying the discriminator to identify that a data store includes at least some data matching a named entity type based on samples of data of the data store, wherein the samples are sampled according to a distribution of the data within the data store, the named entity type representing the sensitive information. 7 . The method of claim 6 , wherein said training the discriminator to classify each one of the training sets as synthetic data or a positive example comprises: providing, by the discriminator, one or more losses, according to respective loss functions, to the discriminator and the generator based on the classification of the each one of the training sets to the discriminator. 8 . The method of claim 7 , wherein said providing the one or more losses comprises: responsive to classifying a training set including a positive example as a positive example or classifying a training set including one synthetic data as synthetic data, providing a low loss to the discriminator; and responsive to classifying the training set including the positive example as synthetic data or classifying the training set including the one synthetic data as synthetic data, providing a high loss to the discriminator. 9 . The method of claim 7 , wherein said providing the one or more losses comprises: responsive to classifying a training set including one synthetic data as a positive example, providing a low loss to the generator; and responsive to classifying the training set including the one synthetic data as synthetic data, providing a high loss to the generator. 10 . The method of claim 6 , further comprising: deploying the generator to generate anonymous data to replace data that include one or more sensitive information. 11 . The method of claim 6 , wherein the discriminator is implemented based on a bidirectional LSTM network. 12 . The method of claim 6 , wherein the generator is implemented based on a unidirectional LSTM network. 13 . The method of claim 6 , wherein said training the generator and said training and deploying the discriminator are implemented by a data store classification service offered by a provider network, and wherein data of the data store is received via a network-based interface for the data store classification service and stored in the data store that is implemented as part of a data storage service offered by the provider network. 14 . One or more non-transitory computer readable media comprising instructions which, when executed on or across one or more processors, cause the one or more processors to: access, using a discriminator, data of a data store; and identify, using the discriminator, that the data store includes at least some data matching a named entity type that represents sensitive information according to a confidence value indicating a probability based on samples of data of the data store, wherein the samples of the data are sampled according to a distribution of the data within the data store, wherein the discriminator is trained to classify each one of training sets to the discriminator that include one or more synthetic data and one or more positive examples as synthetic data or a positive example, the one or more positive examples each including the sensitive information, and the one or more synthetic data including information that simulates the sensitive information and generated by a generator based on training sets to the generator. 15 . The one or more non-transitory computer readable media of claim 14 comprising instructions which, when executed on or across the one or more processors, cause the one or more processors to: responsive to classifying a training set including a positive example as a positive example or classifying a training set including one synthetic data as synthetic data, provide a low loss to the discriminator; and responsive to classifying the training set including the positive example as synthetic data or classifying the training set including the one synthetic data as a positive example, provide a high loss to the discriminator. 16 . The one or more non-transitory computer readable media of claim 14 comprising instructions which, when executed on or across the one or more processors, cause the one or more processors to: responsive to classifying a training set including one synthetic data as a positive example, provide a low loss to the generator; and responsive to classifying the training set including the one synthetic data as synthetic data, provide a high loss to the generator. 17 . The one or more non-transitory computer readable media of claim 15 comprising instructions which, when executed on or across the one or more processors, cause the one or more processors to: generate, using the generator, anonymous data to replace data that include one or more sensitive information. 18 . The one or more non-transitory computer readable media of claim 14 comprising instructions which, when executed on or across the one or more processors,

Assignees

Inventors

Classifications

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • Classification techniques · CPC title

  • Named entity recognition · CPC title

  • Combinations of networks · CPC title

  • using kernel methods, e.g. support vector machines [SVM] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11797705B1 cover?
A generative adversarial network (GAN) may be implemented to recognize named entity types in detection of sensitive information in datasets. The GAN may include a generator and a discriminator. The generator may be trained to produce synthetic data to include information that simulates named entity types representing the sensitive information. The discriminator may be fed with real data that ar…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).