Systems, methods, and media for generating sanitized data, sanitizing anomaly detection models, and/or generating sanitized anomaly detection models

US10178113B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10178113-B2
Application numberUS-201514798006-A
CountryUS
Kind codeB2
Filing dateJul 13, 2015
Priority dateNov 15, 2006
Publication dateJan 8, 2019
Grant dateJan 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and media for generating sanitized data, sanitizing anomaly detection models, and generating anomaly detection models are provided. In some embodiments, methods for sanitizing anomaly detection models are provided. The methods including: receiving at least one abnormal anomaly detection model from at least one remote location; comparing at least one of the at least one abnormal anomaly detection model to a local normal detection model to produce a common set of features common to both the at least one abnormal anomaly detection model and the local normal detection model; and generating a sanitized normal anomaly detection model by removing the common set of features from the local normal detection model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating sanitized data, the method comprising: receiving, using a hardware processor, a plurality of data items; generating a plurality of data subsets from the plurality of data items, wherein each of the plurality of data subsets includes at least one data item of the plurality of data items; generating a plurality of anomaly detection models, wherein each of the plurality of anomaly detection models is generated based on one or more of the plurality of data subsets; determining whether to remove at least one data item from the plurality of data items in response to applying the at least one data item to each of the plurality of anomaly detection models; generating a dataset that includes at least a portion of the plurality of data items based on the determination, wherein the dataset includes a set of normal data items; and generating a normal anomaly detection model based on the set of normal data items included in the dataset. 2. The method of claim 1 , wherein the plurality of anomaly detection models includes one anomaly detection model for each data subset of the plurality of data subsets. 3. The method of claim 1 , wherein the determination further comprises determining a score for the at least one data item by testing the at least one data item against each of the plurality of anomaly detection models. 4. The method of claim 1 , wherein the determination further comprises: determining a label for the at least one data item from each of the plurality of anomaly detection models, wherein the label indicates that the at least one data item is normal or abnormal; and assigning an updated label to the at least one data item that indicates the at least one data item is abnormal based on the sum of the labels in comparison with a threshold value. 5. The method of claim 4 , further comprising testing a second dataset using the anomaly detection model to determine whether the second dataset contains attack data. 6. The method of claim 1 , wherein the determination further comprises: determining a label for the at least one data item from each of the plurality of anomaly detection models, wherein the label indicates that the at least one data item is normal or abnormal; adjusting the label from each of the plurality of anomaly detection models based on a weight assigned to each anomaly detection model; and assigning an updated label to the at least one data item that indicates the at least one data item is abnormal based on the weighted sum of the adjusted labels in comparison with a threshold value. 7. The method of claim 6 , wherein at least two anomaly detection models of the plurality of anomaly detection models are assigned different weights. 8. The method of claim 6 , wherein at least two anomaly detection models of the plurality of anomaly detection models are assigned the same weight. 9. The method of claim 6 , wherein the assigned weight is based on the number of data items in the one or more data subsets used to generate an anomaly detection model. 10. The method of claim 1 , wherein the generating further comprises removing the at least one data item from the plurality of data items in response to the determination. 11. The method of claim 1 , further comprising generating an anomaly detection model based on the generated dataset. 12. The method of claim 1 , wherein the dataset includes a set of abnormal data items and wherein the method further comprises: generating an abnormal anomaly detection model based on the set of abnormal data items included in the dataset; and generating a local abnormal anomaly detection model based on the normal anomaly detection model and the abnormal anomaly detection model. 13. The method of claim 12 , wherein at least one of the normal anomaly detection model, the abnormal anomaly detection model, and the local abnormal anomaly detection model is shared with a remote computing device. 14. A system for generating sanitized data, the system comprising: a hardware processor that is configured to: receive a plurality of data items; generate a plurality of data subsets from the plurality of data items, wherein each of the plurality of data subsets includes at least one data item of the plurality of data items; generate a plurality of anomaly detection models, wherein each of the plurality of anomaly detection models is generated based on one or more of the plurality of data subsets; determine whether to remove at least one data item from the plurality of data items in response to applying the at least one data item to each of the plurality of anomaly detection models; generate a dataset that includes at least a portion of the plurality of data items based on the determination, wherein the dataset includes a set of normal data items; and generate a normal anomaly detection model based on the set of normal data items included in the dataset. 15. A non-transitory computer-readable medium containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for generating sanitized data, the method comprising: receiving a plurality of data items; generating a plurality of data subsets from the plurality of data items, wherein each of the plurality of data subsets includes at least one data item of the plurality of data items; generating a plurality of anomaly detection models, wherein each of the plurality of anomaly detection models is generated based on one or more of the plurality of data subsets; determining whether to remove at least one data item from the plurality of data items in response to applying the at least one data item to each of the plurality of anomaly detection models; generating a dataset that includes at least a portion of the plurality of data items based on the determination, wherein the dataset includes a set of normal data items; and generating a normal anomaly detection model based on the set of normal data items included in the dataset. 16. A method for generating sanitized data, the method comprising: receiving, using a hardware processor, a plurality of data items; generating a plurality of data subsets from the plurality of data items, wherein each of the plurality of data subsets includes at least one data item of the plurality of data items; generating a plurality of anomaly detection models, wherein each of the plurality of anomaly detection models is generated based on one or more of the plurality of data subsets; determining whether to remove at least one data item from the plurality of data items in response to applying the at least one data item to each of the plurality of anomaly detection models; generating a dataset that includes at least a portion of the plurality of data items based on the determination, wherein the dataset includes a set of abnormal data items; and generating an abnormal anomaly detection model based on the set of abnormal data items included in the dataset. 17. A system for generating sanitized data, the system comprising: a hardware processor that is configured to: receive a plurality of data items; generate a plurality of data subsets from the plurality of data items, wherein each of the plurality of data subsets includes at least one data item of the plurality of data items; generate a plurality of anomaly detection models, wherein each of the plurality of anomaly detection models is generated based on one or more of the plurality of data subsets; determine whether to remove at least one data item from the plurality of data items in response to a

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • for detecting or protecting against malicious traffic · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Subject matter not provided for in other groups of this subclass · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10178113B2 cover?
Systems, methods, and media for generating sanitized data, sanitizing anomaly detection models, and generating anomaly detection models are provided. In some embodiments, methods for sanitizing anomaly detection models are provided. The methods including: receiving at least one abnormal anomaly detection model from at least one remote location; comparing at least one of the at least one abnorma…
Who is the assignee on this patent?
Ciocarlie Gabriela F, Stavrou Angelos, Stolfo Salvatore J, and 2 more
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).