Extraction of anomaly related rules using data mining and machine learning

US11568181B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11568181-B2
Application numberUS-201916260679-A
CountryUS
Kind codeB2
Filing dateJan 29, 2019
Priority dateJan 29, 2019
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for extracting anomaly related rules from organizational data. One method comprises obtaining anomaly analysis data integrated from multiple data sources of an organization, wherein the multiple data sources comprise at least one set of labeled anomaly data related to anomalous transactions; extracting features from the integrated anomaly analysis data that correlate with an indication of an anomaly; training multiple machine learning models using the extracted features, where the machine learning models are trained using different combinations of the extracted features; evaluating a performance of the trained machine learning models; and extracting rules from the trained machine learning models based on the performance, wherein the extracted rules are used to classify transactions as anomalous. The trained machine learning models comprise a decision tree comprising paths to an anomaly classification. The extracted rules are optionally in a human-readable format.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: A method, comprising: obtaining anomaly analysis data integrated from a plurality of data sources of an organization, wherein the plurality of data sources comprises at least one set of labeled anomaly data comprising information related to transactions that have been labeled as anomalous transactions; extracting features from the integrated anomaly analysis data that correlate with an indication of an anomaly, based on predefined correlation criteria; initiating a training, using at least one processing device, of a plurality of machine learning models using the extracted features, wherein each of the plurality of machine learning models is trained using different combinations of the extracted features, wherein one or more of the trained machine learning models comprise at least one decision tree, wherein the at least one decision tree comprises a plurality of paths to an anomaly classification, wherein each path comprises a logical combination of conditions to a leaf node; evaluating a performance of the plurality of trained machine learning models; and extracting one or more rules from one or more of the trained machine learning models based on the performance, wherein the extracted one or more rules are used to classify transactions as anomalous, wherein each extracted rule is associated with a given leaf node of the at least one decision tree and is extracted by aggregating the conditions associated with at least some of the nodes in the at least one decision tree along the respective path to the given leaf node; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 2. The method of claim 1 , wherein an integration of the anomaly analysis data from the plurality of data sources of an organization comprises one or more of merging tables, removing irrelevant information and removing redundant information. 3. The method of claim 1 , wherein the extracted features comprise at least one engineered feature relevant to anomaly classification based on domain knowledge. 4. The method of claim 1 , wherein the extracted features comprise one or more of contact information features; online activity features and order processing features. 5. The method of claim 1 , wherein the anomaly classification comprises a predefined significance. 6. The method of claim 1 , wherein one or more properties of the extracted one or more rules are tunable by a user. 7. The method of claim 1 , further comprising the step of adjusting a distribution of instances of at least one label in the anomaly analysis data to address an imbalance of the at least one label. 8. The method of claim 1 , wherein one or more of the extracted rules are in a human-readable format for one or more of configuration and modification by a user. 9. A system, comprising: a memory; and at least one processing device, coupled to the memory, operative to implement the following steps: obtaining anomaly analysis data integrated from a plurality of data sources of an organization, wherein the plurality of data sources comprises at least one set of labeled anomaly data comprising information related to transactions that have been labeled as anomalous transactions; extracting features from the integrated anomaly analysis data that correlate with an indication of an anomaly, based on predefined correlation criteria; initiating a training, using at least one processing device, of a plurality of machine learning models using the extracted features, wherein each of the plurality of machine learning models is trained using different combinations of the extracted features, wherein one or more of the trained machine learning models comprise at least one decision tree, wherein the at least one decision tree comprises a plurality of paths to an anomaly classification, wherein each path comprises a logical combination of conditions to a leaf node; evaluating a performance of the plurality of trained machine learning models; and extracting one or more rules from one or more of the trained machine learning models based on the performance, wherein the extracted one or more rules are used to classify transactions as anomalous, wherein each extracted rule is associated with a given leaf node of the at least one decision tree and is extracted by aggregating the conditions associated with at least some of the nodes in the at least one decision tree along the respective path to the given leaf node. 10. The system of claim 9 , wherein an integration of the anomaly analysis data from the plurality of data sources of an organization comprises one or more of merging tables, removing irrelevant information and removing redundant information. 11. The system of claim 9 , wherein the extracted features comprise at least one engineered feature relevant to anomaly classification based on domain knowledge. 12. The system of claim 9 , wherein the anomaly classification comprises a predefined significance. 13. The system of claim 9 , wherein one or more properties of the extracted one or more rules are tunable by a user. 14. The system of claim 9 , wherein one or more of the extracted rules are in a human-readable format for one or more of configuration and modification by a user. 15. A computer program product, comprising a tangible machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps: obtaining anomaly analysis data integrated from a plurality of data sources of an organization, wherein the plurality of data sources comprises at least one set of labeled anomaly data comprising information related to transactions that have been labeled as anomalous transactions; extracting features from the integrated anomaly analysis data that correlate with an indication of an anomaly, based on predefined correlation criteria; initiating a training, using at least one processing device, of a plurality of machine learning models using the extracted features, wherein each of the plurality of machine learning models is trained using different combinations of the extracted features, wherein one or more of the trained machine learning models comprise at least one decision tree, wherein the at least one decision tree comprises a plurality of paths to an anomaly classification, wherein each path comprises a logical combination of conditions to a leaf node; evaluating a performance of the plurality of trained machine learning models; and extracting one or more rules from one or more of the trained machine learning models based on the performance, wherein the extracted one or more rules are used to classify transactions as anomalous, wherein each extracted rule is associated with a given leaf node of the at least one decision tree and is extracted by aggregating the conditions associated with at least some of the nodes in the at least one decision tree along the respective path to the given leaf node. 16. The computer program product of claim 15 , wherein an integration of the anomaly analysis data from the plurality of data sources of an organization comprises one or more of merging tables, removing irrelevant information and removing redundant information. 17. The computer program product of claim 15 , wherein the extracted features comprise at least one engineered feature relevant to anomaly classification based on domain knowledge. 18. The computer program product of claim 15 , wherein th

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns · CPC title

  • Ensemble learning · CPC title

  • G06K9/6267Primary

    Physics · mapped topic

  • Classification techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568181B2 cover?
Techniques are provided for extracting anomaly related rules from organizational data. One method comprises obtaining anomaly analysis data integrated from multiple data sources of an organization, wherein the multiple data sources comprise at least one set of labeled anomaly data related to anomalous transactions; extracting features from the integrated anomaly analysis data that correlate wit…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06K9/6267. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).