Who is the assignee on this patent?

State Farm Mutual Automobile Insurance Co

What technology area does this patent fall under?

Primary CPC classification G06F11/3452. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Detection of anomalous computer behavior

US10248533B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10248533-B1
Application number	US-201715643757-A
Country	US
Kind code	B1
Filing date	Jul 7, 2017
Priority date	Jul 11, 2016
Publication date	Apr 2, 2019
Grant date	Apr 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers comprises (1) receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer, (2) executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features, (3) implementing a plurality of traffic dispersion graphs to generate a second list of features, and (4) implementing an autoencoder and a random forest regressor to generate a third list of features.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, the computer-implemented method comprising, via one or more processors and/or transceivers: receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer; determining a plurality of embedded features that are included in each entry; determining a plurality of derived features that are extracted from the embedded features; creating a plurality of features including the embedded features and the derived features; executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features; implementing a plurality of traffic dispersion graphs to generate a second list of features; and implementing an autoencoder and a random forest regressor to generate a third list of features. 2. The computer-implemented method of claim 1 , wherein the time series decomposition algorithm is executed on a portion of the features of the data to determine one or more outlying values for each computer for a portion of the features for each of a plurality of time periods. 3. The computer-implemented method of claim 2 , wherein executing the time series decomposition algorithm includes calculating a first feature score for each computer for each feature and time period combination, and generating the first list of features to include each feature associated with the first feature scores that are greater than a first threshold. 4. The computer-implemented method of claim 1 , wherein implementing the traffic dispersion graphs includes creating a first plurality of data structures, one data structure for one traffic dispersion graph for each of a plurality of time periods, each traffic dispersion graph including a plurality of connected points illustrating communication between the computers and the websites, determining the computers for which the connected points form a giant connected component and the computers not in the giant connected component for each traffic dispersion graph, and generating the second list of features that includes which computers are not in the giant connected component on a repeated basis and further includes an average number of computers not in the giant connected component for each traffic dispersion graph. 5. The computer-implemented method of claim 1 , wherein implementing the autoencoder includes encoding original data of each entry and decoding the encoded data, calculating an error level between the original data and the decoded data for each entry of data, each entry including all of the features, and generating a first list of entries that includes the entries for which the error level is greater than a second threshold. 6. The computer-implemented method of claim 5 , wherein implementing the autoencoder includes implementing a plurality of decision trees to generate the third list of features to include the features which contributed most to the values of the error levels of the entries in the first list of entries from the autoencoder. 7. The computer-implemented method of claim 5 , wherein the third list of features includes a ranking of features which contributed most to the values of the error levels of the entries in the first list of entries. 8. A non-transitory computer-readable medium with an executable program stored thereon for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, wherein the program instructs a processing element of a computing device to perform the following: receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer; determining a plurality of embedded features that are included in each entry; determining a plurality of derived features that are extracted from the embedded features; creating a plurality of features including the embedded features and the derived features; executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features; implementing a plurality of traffic dispersion graphs to generate a second list of features; and implementing an autoencoder and a random forest regressor to generate a third list of features. 9. The non-transitory computer-readable medium of claim 8 , wherein the time series decomposition algorithm is executed on a portion of the features of the data to determine one or more outlying values for each computer for a portion of the features for each of a plurality of time periods. 10. The non-transitory computer-readable medium of claim 9 , wherein executing the time series decomposition algorithm includes calculating a first feature score for each computer for each feature and time period combination, and generating the first list of features to include each feature associated with the first feature scores that are greater than a first threshold. 11. The non-transitory computer-readable medium of claim 8 , wherein implementing the traffic dispersion graphs includes creating a first plurality of data structures, one data structure for one traffic dispersion graph for each of a plurality of time periods, each traffic dispersion graph including a plurality of connected points illustrating communication between the computers and the websites, determining the computers for which the connected points form a giant connected component and the computers not in the giant connected component for each traffic dispersion graph, and generating the second list of features that includes which computers are not in the giant connected component on a repeated basis and further includes an average number of computers not in the giant connected component for each traffic dispersion graph. 12. The non-transitory computer-readable medium of claim 8 , wherein implementing the autoencoder includes encoding original data of each entry and decoding the encoded data, calculating an error level between the original data and the decoded data for each entry of data, each entry including all of the features, and generating a first list of entries that includes the entries for which the error level is greater than a second threshold. 13. The non-transitory computer-readable medium of claim 12 , wherein implementing the autoencoder includes implementing a plurality of decision trees to generate the third list of features to include the features which contributed most to the values of the error levels of the entries in the first list of entries from the autoencoder. 14. The non-transitory computer-readable medium of claim 13 , wherein the third list of features includes a ranking of features which contributed most to the values of the error levels of the entries in the first list of entries. 15. A computing device for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, the device comprising: a communication element configured to receive and transmit communications to and from a plurality of servers and computers within an organization; a memory element electronically coupled to the communication element, the

Assignees

State Farm Mutual Automobile Insurance Co

Inventors

Classifications

H04L63/1425
Traffic logging, e.g. anomaly detection · CPC title
G06N7/00
Computing arrangements based on specific mathematical models · CPC title
G06F11/0766
Error or fault reporting or storing · CPC title
G06N3/045
Combinations of networks · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

Patent family

Related publications grouped by family.

View patent family 65898616

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10248533B1 cover?: A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers comprises (1) receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being as…
Who is the assignee on this patent?: State Farm Mutual Automobile Insurance Co
What technology area does this patent fall under?: Primary CPC classification G06F11/3452. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).