Methods of unsupervised anomaly detection using a geometric framework

US9306966B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9306966-B2
Application numberUS-201313987690-A
CountryUS
Kind codeB2
Filing dateAug 20, 2013
Priority dateDec 14, 2001
Publication dateApr 5, 2016
Grant dateApr 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. Data elements are mapped to a feature space which is typically a vector space d . Anomalies are detected by determining which points lies in sparse regions of the feature space. Two feature maps are used for mapping data elements to a feature apace. A first map is a data-dependent normalization feature map which we apply to network connections. A second feature map is a spectrum kernel which we apply to system call traces.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for unsupervised detection of an anomaly in the operation of a computer system comprising: (b) mapping a set of unlabeled data instances, which do not indicate any anomaly occurrence, to a feature space; (c) calculating one or more sparse regions in the feature space; and (d) designating one or more data instances from the set of unlabeled data instances as an anomaly if said one or more data instances is located in said one or more sparse regions of the feature space. 2. The method of claim 1 , wherein receiving the set of unlabeled data instances comprises receiving the set of unlabeled data instances from an audit stream. 3. The method of claim 2 , wherein receiving the set of unlabeled data instances comprises receiving a set of system call trace data. 4. The method of claim 2 , wherein receiving the set of unlabeled data instances comprises receiving a set of network connections records data. 5. The method of claim 4 , wherein receiving the set of unlabeled data instances comprises receiving a sequence of TCP packets. 6. The method of claim 5 , wherein the features of the TCP packets comprise at least one of a duration of the network connection, a protocol type, and number of byte transferred by the connection, and an indication of the status of the connection. 7. The method of claim 1 , wherein implicitly mapping the set of unlabeled data instances comprises implicitly mapping the set of unlabeled data instances to a vector space. 8. The method of claim 1 , wherein implicitly mapping the set of unlabeled data instances comprises normalizing the set of unlabeled data instances based on respective values of features of the set of unlabeled data instances. 9. The method of claim 8 , further comprising normalizing each data instance in the set of unlabeled data instances based on a corresponding number of standard deviations of each data instance from a mean of the set of unlabeled data instances. 10. The method of claim 1 , wherein implicitly mapping the set of unlabeled data instances comprises applying a convolution kernel to the set of unlabeled data instances. 11. The method of claim 10 , wherein applying a convolution kernel comprises applying a spectrum kernel to the set of unlabeled data instances. 12. The method of claim 1 , further comprising, after implicitly mapping the set of unlabeled data instances to a feature space, associating each data instance in the set of unlabeled data instances with one of a plurality of clusters. 13. The method of claim 12 , further comprising, determining a distance between a selected data instance and a nearest cluster in the plurality of clusters. 14. The method of claim 13 , further comprising, if the distance between the selected data instance and the nearest cluster is less than or equal to a predetermined cluster width, associating the selected data instance with the selected cluster. 15. The method of claim 13 , further comprising, if the distance between the selected data instance and the selected cluster is greater than the cluster width, creating a new cluster and associating the selected data instance with the new cluster. 16. The method of claim 12 , further comprising, determining a percentage of clusters having the greatest number of data instances respectively associated therewith. 17. The method of claim 16 , wherein the percentage of clusters having the greatest number of data instances are labeled as dense regions in the feature space and wherein the remaining clusters are labeled as sparse regions in the feature space. 18. The method of claim 17 , wherein designating one or more data instances as an anomaly comprises associating each data instance in the set of unlabeled data instances with a respective cluster. 19. The method of claim 1 , further comprising, determining a sum of distances between a selected data instance and k nearest data instances to the selected data instance, wherein k is a predetermined value. 20. The method of claim 19 , wherein determining the sum of the distances comprises determining a nearest cluster as a cluster corresponding to a shortest distance between the respective center of the cluster and the selected data instance. 21. The method of claim 20 , further comprising, if a distance between the selected data instance and each data instance in the nearest cluster is less than a predetermined minimum distance, designating the point the cluster as one of the k nearest neighbors. 22. The method of claim 21 , wherein designating one or more data instances as an anomaly comprises determining whether sum of the distances to the k nearest neighbors exceeds a predetermined threshold. 23. The method of claim 1 , wherein designating one or more data instances as an anomaly comprises determining a decision function to separate the set of data instances from an origin and computing the decision function.

Assignees

Inventors

Classifications

  • Traffic logging, e.g. anomaly detection · CPC title

  • Mapping; Conversion · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9306966B2 cover?
A method for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. Data elements are mapped to a feature space which is typically a vector space d . Anomalies are detected by determining which points lies in sparse regions of the feature space. Two feature maps are used for mapping data elements to a feature apace. A first map is a data-dependent n…
Who is the assignee on this patent?
Univ Columbia
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).