Detection of anomalous computer behavior

US10248533B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10248533-B1
Application numberUS-201715643757-A
CountryUS
Kind codeB1
Filing dateJul 7, 2017
Priority dateJul 11, 2016
Publication dateApr 2, 2019
Grant dateApr 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers comprises (1) receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer, (2) executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features, (3) implementing a plurality of traffic dispersion graphs to generate a second list of features, and (4) implementing an autoencoder and a random forest regressor to generate a third list of features.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, the computer-implemented method comprising, via one or more processors and/or transceivers: receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer; determining a plurality of embedded features that are included in each entry; determining a plurality of derived features that are extracted from the embedded features; creating a plurality of features including the embedded features and the derived features; executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features; implementing a plurality of traffic dispersion graphs to generate a second list of features; and implementing an autoencoder and a random forest regressor to generate a third list of features. 2. The computer-implemented method of claim 1 , wherein the time series decomposition algorithm is executed on a portion of the features of the data to determine one or more outlying values for each computer for a portion of the features for each of a plurality of time periods. 3. The computer-implemented method of claim 2 , wherein executing the time series decomposition algorithm includes calculating a first feature score for each computer for each feature and time period combination, and generating the first list of features to include each feature associated with the first feature scores that are greater than a first threshold. 4. The computer-implemented method of claim 1 , wherein implementing the traffic dispersion graphs includes creating a first plurality of data structures, one data structure for one traffic dispersion graph for each of a plurality of time periods, each traffic dispersion graph including a plurality of connected points illustrating communication between the computers and the websites, determining the computers for which the connected points form a giant connected component and the computers not in the giant connected component for each traffic dispersion graph, and generating the second list of features that includes which computers are not in the giant connected component on a repeated basis and further includes an average number of computers not in the giant connected component for each traffic dispersion graph. 5. The computer-implemented method of claim 1 , wherein implementing the autoencoder includes encoding original data of each entry and decoding the encoded data, calculating an error level between the original data and the decoded data for each entry of data, each entry including all of the features, and generating a first list of entries that includes the entries for which the error level is greater than a second threshold. 6. The computer-implemented method of claim 5 , wherein implementing the autoencoder includes implementing a plurality of decision trees to generate the third list of features to include the features which contributed most to the values of the error levels of the entries in the first list of entries from the autoencoder. 7. The computer-implemented method of claim 5 , wherein the third list of features includes a ranking of features which contributed most to the values of the error levels of the entries in the first list of entries. 8. A non-transitory computer-readable medium with an executable program stored thereon for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, wherein the program instructs a processing element of a computing device to perform the following: receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer; determining a plurality of embedded features that are included in each entry; determining a plurality of derived features that are extracted from the embedded features; creating a plurality of features including the embedded features and the derived features; executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features; implementing a plurality of traffic dispersion graphs to generate a second list of features; and implementing an autoencoder and a random forest regressor to generate a third list of features. 9. The non-transitory computer-readable medium of claim 8 , wherein the time series decomposition algorithm is executed on a portion of the features of the data to determine one or more outlying values for each computer for a portion of the features for each of a plurality of time periods. 10. The non-transitory computer-readable medium of claim 9 , wherein executing the time series decomposition algorithm includes calculating a first feature score for each computer for each feature and time period combination, and generating the first list of features to include each feature associated with the first feature scores that are greater than a first threshold. 11. The non-transitory computer-readable medium of claim 8 , wherein implementing the traffic dispersion graphs includes creating a first plurality of data structures, one data structure for one traffic dispersion graph for each of a plurality of time periods, each traffic dispersion graph including a plurality of connected points illustrating communication between the computers and the websites, determining the computers for which the connected points form a giant connected component and the computers not in the giant connected component for each traffic dispersion graph, and generating the second list of features that includes which computers are not in the giant connected component on a repeated basis and further includes an average number of computers not in the giant connected component for each traffic dispersion graph. 12. The non-transitory computer-readable medium of claim 8 , wherein implementing the autoencoder includes encoding original data of each entry and decoding the encoded data, calculating an error level between the original data and the decoded data for each entry of data, each entry including all of the features, and generating a first list of entries that includes the entries for which the error level is greater than a second threshold. 13. The non-transitory computer-readable medium of claim 12 , wherein implementing the autoencoder includes implementing a plurality of decision trees to generate the third list of features to include the features which contributed most to the values of the error levels of the entries in the first list of entries from the autoencoder. 14. The non-transitory computer-readable medium of claim 13 , wherein the third list of features includes a ranking of features which contributed most to the values of the error levels of the entries in the first list of entries. 15. A computing device for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, the device comprising: a communication element configured to receive and transmit communications to and from a plurality of servers and computers within an organization; a memory element electronically coupled to the communication element, the

Assignees

Inventors

Classifications

  • Traffic logging, e.g. anomaly detection · CPC title

  • Computing arrangements based on specific mathematical models · CPC title

  • Error or fault reporting or storing · CPC title

  • Combinations of networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10248533B1 cover?
A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers comprises (1) receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being as…
Who is the assignee on this patent?
State Farm Mutual Automobile Insurance Co
What technology area does this patent fall under?
Primary CPC classification G06F11/3452. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).