Method and Apparatus for Augmented Data Anomaly Detection

US2021287071A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021287071-A1
Application numberUS-202117200606-A
CountryUS
Kind codeA1
Filing dateMar 12, 2021
Priority dateMar 12, 2020
Publication dateSep 16, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data anomaly detection method and apparatus in which a deep neural network is trained on baseline data. Sequences of statistics of each layer of the deep neural network are saved, processed and used to train an LSTM autoencoder across a variety of reconstruction error thresholds, and a preferred threshold is selected for an optimized autoencoder. In an Inference mode, a data sample is presented to the autoencoder; the reconstruction error is calculated and compared to the threshold. If it is above the threshold, then the data sample is an out-of-distribution sample, and the sample is tagged as anomalous.

First claim

Opening claim text (preview).

1 . A method for detection of data anomalies via a deep multi-layer neural network architecture, the method being implemented by a computer system that comprises one or more processors executing computer program instructions that, when executed, perform the method, the method comprising: in a neural network training phase: a. obtaining a first collection of actual data items corresponding to one or more groups of data categories, said first collection of actual data items having a first data distribution; b. using a first neural network to generate a set of synthetic data items using a synthetic data generation configuration; c. providing said collection of actual data items and said set of synthetic items to a second neural network; d. using the second neural network to (i) make a classification determination using a set of classification determination configurations including whether each data item in said set of synthetic data items is synthetic or actual, and (ii) update said set of classification determination configurations; e. providing said classification determinations to said first neural network; f. using said classification determinations by said first neural network to update said synthetic data generation configuration; g. repeating steps b through f until said second neural network cannot make a valid classification determination; h. generating autoencoder training sequences of updated classification determination configurations for each layer in said second neural network; in an autoencoder phase: i. providing said autoencoder training sequences to an autoencoder, and said autoencoder training itself to differentiate anomalous data from real data using said autoencoder training sequences across a range of reconstruction error thresholds; j. selecting a preferred reconstruction error threshold based on autoencoder performance during said training step to result in said autoencoder being optimized for recognition of anomalous data; in a data anomaly detection phase: k. submitting to the second neural network a purported data item; l. generating by said second neural network new sequences of classification determination configurations corresponding to said purported data item; m. providing said new sequences to said autoencoder, said autoencoder generating a prediction as to whether said purported data item falls within said first data distribution; n. classifying by said autoencoder said purported data item as anomalous if said purported data item falls outside said first data distribution; o. sending said new sequences to said second neural network if said purported data item is determined by said autoencoder to fall within said first data distribution, and making a classification determination by said second neural network for said purported data items using said set of classification configurations; and p. notifying a user that said purported data item may be anomalous if said second neural network determines that said purported data item is synthetic. 2 . A method according to claim 1 , wherein said first neural network and said second neural network are a generator and a discriminator, respectively, of a generative adversarial network. 3 . A method according to claim 1 , wherein said actual data is text data and said anomalous data is malicious text. 4 . A system comprising: a computer system that comprises one or more processors executing computer program instructions that, when executed, cause the computer system to: in a neural network training phase: a. obtain a first collection of actual data items corresponding to one or more groups of data categories, said first collection of actual data items having a first data distribution; b. use a first neural network to generate a set of synthetic data items using a synthetic data generation configuration; c. provide said collection of actual data items and said set of synthetic items to a second neural network; d. use the second neural network to (i) make a classification determination using a set of classification determination configurations including whether each data item in said set of synthetic data items are synthetic or actual, and (ii) update said set of classification determination configurations; e. provide said classification determinations to said first neural network; f. use said classification determinations by said first neural network to update said synthetic data generation configuration; g. repeat steps b through f until said second neural network cannot make a valid classification determination; h. generating autoencoder training sequences of updated classification determination configurations for each layer in said second neural network; in an autoencoder training phase: i. provide said autoencoder training sequences to an autoencoder to train itself to differentiate anomalous data from real data using said autoencoder training sequences across a range of reconstruction error thresholds; j. select a preferred reconstruction error threshold based on autoencoder performance during said training step to result in said autoencoder being optimized for recognition of anomalous data; in a data anomaly detection phase: k. submit to the second neural network a purported data item; l. generate by said second neural network new sequences of classification determination configurations corresponding to said purported data item; m. provide said new sequences to said autoencoder, and generate by said autoencoder a prediction as to whether said purported data item falls within said first data distribution; n. classify by said autoencoder said purported data item as anomalous if said purported data item falls outside said first data distribution; o. send said new sequences to said second neural network if said purported data item is determined by said autocoder to fall within said first data distribution, and make a classification determination by said second neural network for said purported data item using said set of classification configurations; p. notify a user that said purported data item may be anomalous or malicious if said second neural network determines that said purported data item is synthetic. 5 . A system according to claim 4 , wherein said first neural network and said second neural network are a generator and a discriminator, respectively of a generative adversarial network. 6 . A system according to claim 4 , wherein said actual data is text data, and said anomalous data is malicious text. 7 . An apparatus comprising: a first neural network configured to a. generate a set of synthetic data items using a synthetic data generation configuration; and b. provide a collection of actual text data items and said set of synthetic items to a second neural network, said collection of actual text data items having a first data distribution; a second neural network configured to (i) make a classification determination using a set of classification determination configurations whether each data item in said set of synthetic data items are synthetic or actual data, (ii) make a classification determination for each data item in said set of synthetic data items and said collection of actual data items using a set of classification configurations; and (iii) update said set of classification determination configurations; (iv) provide said classification determinations to said first neural network; said first neural network further configured to: c. use said classification determinations by said second neural network to update said synthetic data generation configuration; said second neural network further configured to: (v) generate autoencoder training sequences of updated classification determination configurations for each layer in

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/088Primary

    Non-supervised learning, e.g. competitive learning · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Classification techniques · CPC title

  • Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021287071A1 cover?
A data anomaly detection method and apparatus in which a deep neural network is trained on baseline data. Sequences of statistics of each layer of the deep neural network are saved, processed and used to train an LSTM autoencoder across a variety of reconstruction error thresholds, and a preferred threshold is selected for an optimized autoencoder. In an Inference mode, a data sample is present…
Who is the assignee on this patent?
Morgan State Univ
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).