Apparatus and method of high dimensional data analysis in real-time

US11494690B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11494690-B2
Application numberUS-201916354207-A
CountryUS
Kind codeB2
Filing dateMar 15, 2019
Priority dateMar 15, 2019
Publication dateNov 8, 2022
Grant dateNov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of high dimensional data analysis in real-time comprising executing dimension-reducing an input historical data set under a t-SNE model and determining from the resulting dimension-reduced data set a recent; further dimension-reducing the recent group data set under a PCA model; statistical analyzing the further dimension-reduced data set to determine a threshold group for distinguishing abnormal data from normal ones in a real-time data stream. The method may further include training a classifier using the abnormal or normal data set for predicting anomaly in the real-time data source system. Alternatively, a discrepancy training data set is computed from one of the normal and abnormal data sets and be used to train one of independent normal and abnormal data regression models; with the other one trained by transfer learning based on the trained one. The trained regression models are then used to predict discrepancy values.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of high dimensional data analysis for distinguishing abnormal data from normal data in a real-time data stream comprising: executing, by a computer processor, a first dimension reduction on an input historical data set to generate a dimension-reduced data set; determining, by the computer processor, a recent group comprising: clustering data points in the dimension-reduced data set into naturally clustered groups by time of occurrences of the data points; and finding the recent group from the naturally clustered groups; executing, by the computer processor, a second dimension reduction on the recent group to generate a further dimension-reduced data set; determining, by the computer processor, a threshold group from the further dimension-reduced data set; distinguishing, by the computer processor, the abnormal data from the normal data in the real-time data stream and generating an abnormal data set of the abnormal data and a normal data set of the normal data, wherein the abnormal data are data points having values outside of the threshold group; performing a first training of data regression models or a second training of data regression models; wherein the first training of data regression models comprising: training an independent normal data regression model using a normal discrepancy training data set to predict a normal data discrepancy value at a future point of time; and training an independent abnormal data regression model by transfer learning based on the trained independent normal data regression model to predict an abnormal data discrepancy value at the future point of time; wherein the second training of data regression models comprising: training an independent abnormal data regression model using an abnormal discrepancy training data set to predict an abnormal data discrepancy value at a future point of time; and training an independent normal data regression model by transfer learning based on the trained independent abnormal data regression model to predict a normal data discrepancy value at the future point of time; wherein the predicted discrepancy values are used to compare to an actual discrepancy value to identify anomaly in the real-time data source system at the future point of time; wherein the trained normal data regression model comprises: a normal data first fully connected (FC) layer configured to receive actual normal discrepancy data in the real-time data stream for initial processing; a normal data neural network of one or more long-short term memory (LSTM) cells residing in between the normal data first FC layer and a normal data second FC layer, the normal data neural network is configured to receive, classify, and make predictions from output of the normal data first FC layer; and the normal data second FC layer configured to receive results from the normal data neural network and predict the normal data discrepancy value at the future point of time; and wherein the trained abnormal data regression model comprises: an abnormal data first FC layer configured to receive actual abnormal discrepancy data in the real-time data stream for initial processing; an abnormal data neural network of one or more LSTM cells residing in between the abnormal data first FC layer and an abnormal data second FC layer, the abnormal data neural network is configured to receive, classify, and make predictions from output of the first FC layer; and the abnormal data second FC layer configured to receive results from the abnormal data neural network and predict the abnormal data discrepancy value at the future point of time. 2. The method of claim 1 , wherein the first dimension reduction comprising reducing data dimension of the input historical data set under a t-distributed stochastic neighbor embedding (t-SNE) model. 3. The method of claim 1 , wherein the determination of the recent group comprising: obtaining multiple dimension-reduced data sets from multiple input historical data sets generated from multiple experiments; and selecting a group containing most recent data from the multiple dimension-reduced data sets as the recent group. 4. The method of claim 1 , wherein the determination of the recent group comprising: selecting a group with a smallest loss function value from the naturally clustered groups; wherein the recent group is the group with the smallest loss function value. 5. The method of claim 1 , wherein the second dimension reduction comprising reducing data dimension of the recent group under a Principle Component Analysis (PCA) model. 6. The method of claim 1 , wherein the threshold group comprising a maximum value, a minimum value, a mean value, a standard deviation, and a maximum occurrence frequency; wherein point in the real-time data stream having a value larger than the maximum value or smaller than the minimum value or outside of the mean value plus the standard deviation is an abnormal data; and wherein an anomaly in the real-time data source system is predicted when abnormal data occurred more frequently than the maximum occurrence frequency. 7. The method of claim 1 , further comprising: training a classifier with a combination of the abnormal data set and the normal data set; identifying, by the trained classifier, each of data points in the real-time data in the real-time data stream as abnormal data or normal data; and predicting, by the trained classifier, any anomaly in the real-time data source system. 8. A system of high dimensional data analysis for distinguishing abnormal data from normal data in a real-time data stream comprising: a first dimension reduction processor having at least a computer processor configured to: execute a first dimension reduction on an input historical data set to generate a dimension-reduced data set; determine a recent group comprising: clustering data points in the dimension-reduced data set into naturally clustered groups by time of occurrences of the data points; and finding the recent group from the naturally clustered groups; a second dimension reduction processor having at least a computer processor configured to executing a second dimension reduction on the recent group to generate a further dimension-reduced data set; a data statistical analyzer having at least a computer processor configured to: determine a threshold group from the further dimension-reduced data set; and distinguish abnormal data from normal data in a real-time data stream and generating an abnormal data set and a normal data set, wherein the abnormal data are data points having values outside of the threshold group; compute from the normal data set a normal discrepancy training data set; and compute from the abnormal data set an abnormal discrepancy training data set; a discrepancy predictor having at least a processor comprising an independent normal data regression model and an independent abnormal data regression model; wherein the independent normal data regression model and the independent abnormal data regression model are trained by a first training and a second training: wherein the first training comprising: training an independent normal data regression model using the normal discrepancy training data set to predict a normal data discrepancy value at a future point of time; and training an independent abnormal data regression model by transfer learning based on the trained independent normal data regression model to predict an abnormal data discrepancy value at the future point of time; wherein the second training comprising: training an independent abnormal data regression model using the abnormal discrepancy training data set to predict an abnormal data discr

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

  • Data stream processing; Continuous queries · CPC title

  • Clustering or classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11494690B2 cover?
A method of high dimensional data analysis in real-time comprising executing dimension-reducing an input historical data set under a t-SNE model and determining from the resulting dimension-reduced data set a recent; further dimension-reducing the recent group data set under a PCA model; statistical analyzing the further dimension-reduced data set to determine a threshold group for distinguishi…
Who is the assignee on this patent?
Hong Kong Applied Science & Tech Research Inst Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).