Computer-implemented system and method for detecting anomalies using sample-based rule identification

US10140576B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10140576-B2
Application numberUS-201414455933-A
CountryUS
Kind codeB2
Filing dateAug 10, 2014
Priority dateAug 10, 2014
Publication dateNov 27, 2018
Grant dateNov 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented system and method for detecting anomalies using sample-based rule identification is provided. Data for data is maintained analytics in a database. A set of anomaly rules is defined. A rare pattern in the data is statistically identified. The identified rare pattern is labeled as at least one of anomaly and non-anomaly based on verification by a domain expert. The set of anomaly rules is adjusted based on the labeled anomaly. Other anomalies in the data are detected and classified by applying the adjusted set of anomaly rules to the data.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising: a non-transitory computer readable storage medium comprising program code and further comprising: a database comprising a data set for data analytics, the data set comprising a plurality of data points; and a set of anomaly rules; a computer processor and memory with the computer processor coupled to the storage medium, wherein the computer processor is configured to execute the program code to perform steps to: statistically identify one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; label each of the identified data points as at least one of anomaly and non-anomaly based on verification by a domain expert; adjust the set of anomaly rules comprised in the database based on at least one of the labeled anomalies, comprising creating an additional anomaly rule and adding the rule to the set, further comprising: determine an entropy of at least a portion of a different data set, the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; use the entropy to set a threshold; and set the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detect and classify as the one or more additional anomalies the one or more data points other than the at least one labeled anomaly comprised in the database by applying the adjusted set of anomaly rues comprised in the database to the statistics for the data points; and control manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies. 2. A system according to claim 1 , wherein the computer processor is further configured to execute the program code to perform steps to: recognize one of the identified data points as the labeled non-anomaly; and modify the set of anomaly rules based on the recognition. 3. A system according to claim 1 , wherein the set of the rules are adjusted based on a plurality of the anomalies, and the statistics comprises a ratio along at least two dimensions. 4. A system according to claim 1 , wherein the computer processor is further configured to execute the program code to perform steps to: select a classification algorithm; and modify the classification algorithm wherein the computer processor is further configured to execute the program code to perform steps to at least one of: modify criteria of fit; and modify a regularization term. 5. A system according to claim 1 , further comprising: the non-transitory computer readable storage medium further comprising: an anomaly threshold comprised in the set of anomaly rules; and a score of the labeled non-anomaly; wherein the computer processor is further configured to execute the program code to perform steps to: refine the set of anomaly rules by comparing the score of the labeled non-anomaly to the anomaly threshold and raising the anomaly threshold if the score of the labeled non-anomaly is below the anomaly threshold. 6. A system according to claim 1 , wherein the computer processor is further configured to execute the program code to perform steps to at least one of: statistically detect incidents in the data set comprised in the database that occur less frequently than the rest of the population in the data set comprised in the database; statistically detect trends in the data set comprised in the database with regard to time; and statistically detect a correlation between one or more events in the data set comprised in the database. 7. A method for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising the steps of: maintaining a data set for data analytics comprised in a storage medium, the data set comprising a plurality of data points; statistically identifying with a computer processor and memory with the computer processor coupled to the non-transitory computer readable storage medium one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; labeling each of the identified data points with the computer processor as at least one of anomaly and non-anomaly based on verification by a domain expert; defining a set of anomaly rules comprised in the storage medium based on at least one of the labeled anomalies, comprising creating one of the anomaly rules and adding the rue to the set, further comprising: determining an entropy of at least a portion of a different data set, the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; using the entropy to set a threshold; and setting the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detecting and classifying as the one or more additional anomalies the one or more data points other than the at least one labeled anomaly in the data comprised in the database with the computer processor by applying the set of anomaly rules to the statistics for the data points; and controlling manipulative malicious activities in at least one of the fields of social welfare, credit card, transportation systems, the Internet networks, and healthcare systems based on the labeled anomalies and the additional anomalies. 8. A method for detecting anomalies using sample-based rule identification with the aid of a digital computer, comprising the steps of: maintaining a data set for data analytics in a database comprised in a non-transitory computer readable storage medium, the data set comprising a plurality of data points; defining a set of anomaly rules comprised in the database comprised in the storage medium; statistically identifying with a computer processor and memory with the computer processor coupled to the non-transitory computer readable storage medium one or more of the data points in the data set comprised in the database as one or more potential anomalies, comprising calculating a statistics for each of the data points; labeling the identified data points with the computer processor as at least one of anomaly and non-anomaly based on verification by a domain expert; adjusting the set of anomaly rules comprised in the database with the computer processor based on the labeled anomalies, comprising creating an additional anomaly rule and adding the additional rule to the set, further comprising: determining an entropy of at least a portion of a different data set the different data set comprising the statistics of all of the data points, the at least the portion comprising the statistics for the at least one anomaly; using the entropy to set a threshold; and setting the additional anomaly rule to label one or more of the data points other than the at least one labeled anomaly as one or more additional anomalies upon the statistics for these data points exceeding the threshold; detecting and classifying as the one or more additional anomalies the data points other than the at least one labeled anomaly comprised in the database with the computer processor by

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • for detecting or protecting against malicious traffic · CPC title

  • involving long-term monitoring or reporting · CPC title

  • Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity · CPC title

  • G06N5/025Primary

    Extracting rules from data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10140576B2 cover?
A computer-implemented system and method for detecting anomalies using sample-based rule identification is provided. Data for data is maintained analytics in a database. A set of anomaly rules is defined. A rare pattern in the data is statistically identified. The identified rare pattern is labeled as at least one of anomaly and non-anomaly based on verification by a domain expert. The set of a…
Who is the assignee on this patent?
Palo Alto Res Ct Inc
What technology area does this patent fall under?
Primary CPC classification G06N5/025. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).