Data analytics lifecycle processes
US-9262493-B1 · Feb 16, 2016 · US
US10140576B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10140576-B2 |
| Application number | US-201414455933-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 10, 2014 |
| Priority date | Aug 10, 2014 |
| Publication date | Nov 27, 2018 |
| Grant date | Nov 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented system and method for detecting anomalies using sample-based rule identification is provided. Data for data is maintained analytics in a database. A set of anomaly rules is defined. A rare pattern in the data is statistically identified. The identified rare pattern is labeled as at least one of anomaly and non-anomaly based on verification by a domain expert. The set of anomaly rules is adjusted based on the labeled anomaly. Other anomalies in the data are detected and classified by applying the adjusted set of anomaly rules to the data.
Opening claim text (preview).
What is claimed is: 1. A system for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising: a non-transitory computer readable storage medium comprising program code and further comprising: a database comprising a data set for data analytics, the data set comprising a plurality of data points; and a set of anomaly rules; a computer processor and memory with the computer processor coupled to the storage medium, wherein the computer processor is configured to execute the program code to perform steps to: statistically identify one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; label each of the identified data points as at least one of anomaly and non-anomaly based on verification by a domain expert; adjust the set of anomaly rules comprised in the database based on at least one of the labeled anomalies, comprising creating an additional anomaly rule and adding the rule to the set, further comprising: determine an entropy of at least a portion of a different data set, the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; use the entropy to set a threshold; and set the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detect and classify as the one or more additional anomalies the one or more data points other than the at least one labeled anomaly comprised in the database by applying the adjusted set of anomaly rues comprised in the database to the statistics for the data points; and control manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies. 2. A system according to claim 1 , wherein the computer processor is further configured to execute the program code to perform steps to: recognize one of the identified data points as the labeled non-anomaly; and modify the set of anomaly rules based on the recognition. 3. A system according to claim 1 , wherein the set of the rules are adjusted based on a plurality of the anomalies, and the statistics comprises a ratio along at least two dimensions. 4. A system according to claim 1 , wherein the computer processor is further configured to execute the program code to perform steps to: select a classification algorithm; and modify the classification algorithm wherein the computer processor is further configured to execute the program code to perform steps to at least one of: modify criteria of fit; and modify a regularization term. 5. A system according to claim 1 , further comprising: the non-transitory computer readable storage medium further comprising: an anomaly threshold comprised in the set of anomaly rules; and a score of the labeled non-anomaly; wherein the computer processor is further configured to execute the program code to perform steps to: refine the set of anomaly rules by comparing the score of the labeled non-anomaly to the anomaly threshold and raising the anomaly threshold if the score of the labeled non-anomaly is below the anomaly threshold. 6. A system according to claim 1 , wherein the computer processor is further configured to execute the program code to perform steps to at least one of: statistically detect incidents in the data set comprised in the database that occur less frequently than the rest of the population in the data set comprised in the database; statistically detect trends in the data set comprised in the database with regard to time; and statistically detect a correlation between one or more events in the data set comprised in the database. 7. A method for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising the steps of: maintaining a data set for data analytics comprised in a storage medium, the data set comprising a plurality of data points; statistically identifying with a computer processor and memory with the computer processor coupled to the non-transitory computer readable storage medium one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; labeling each of the identified data points with the computer processor as at least one of anomaly and non-anomaly based on verification by a domain expert; defining a set of anomaly rules comprised in the storage medium based on at least one of the labeled anomalies, comprising creating one of the anomaly rules and adding the rue to the set, further comprising: determining an entropy of at least a portion of a different data set, the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; using the entropy to set a threshold; and setting the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detecting and classifying as the one or more additional anomalies the one or more data points other than the at least one labeled anomaly in the data comprised in the database with the computer processor by applying the set of anomaly rules to the statistics for the data points; and controlling manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies. 8. A method for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising the steps of: maintaining a data set for data analytics in a database comprised in a non-transitory computer readable storage medium, the data set comprising a plurality of data points; defining a set of anomaly rules comprised in the database comprised in the storage medium; statistically identifying with a computer processor and memory with the computer processor coupled to the non-transitory computer readable storage medium one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; labeling the identified data points with the computer processor as at least one of anomaly and non-anomaly based on verification by a domain expert; adjusting the set of anomaly rules comprised in the database with the computer processor based on the labeled anomalies, comprising creating an additional anomaly rule and adding the additional rule to the set, further comprising: determining an entropy of at least a portion of a different data set the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; using the entropy to set a threshold; and setting the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detecting and classifying as the one or more additional anomalies the data points other than the at least one labeled anomaly comprised in the database with the computer processor by
Physics · mapped topic
for detecting or protecting against malicious traffic · CPC title
involving long-term monitoring or reporting · CPC title
Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity · CPC title
Extracting rules from data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.