Systems and methods for forecasting using cartesian genetic programming
US-2015112636-A1 · Apr 23, 2015 · US
US10248533B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10248533-B1 |
| Application number | US-201715643757-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jul 7, 2017 |
| Priority date | Jul 11, 2016 |
| Publication date | Apr 2, 2019 |
| Grant date | Apr 2, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers comprises (1) receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer, (2) executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features, (3) implementing a plurality of traffic dispersion graphs to generate a second list of features, and (4) implementing an autoencoder and a random forest regressor to generate a third list of features.
Opening claim text (preview).
We claim: 1. A computer-implemented method for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, the computer-implemented method comprising, via one or more processors and/or transceivers: receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer; determining a plurality of embedded features that are included in each entry; determining a plurality of derived features that are extracted from the embedded features; creating a plurality of features including the embedded features and the derived features; executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features; implementing a plurality of traffic dispersion graphs to generate a second list of features; and implementing an autoencoder and a random forest regressor to generate a third list of features. 2. The computer-implemented method of claim 1 , wherein the time series decomposition algorithm is executed on a portion of the features of the data to determine one or more outlying values for each computer for a portion of the features for each of a plurality of time periods. 3. The computer-implemented method of claim 2 , wherein executing the time series decomposition algorithm includes calculating a first feature score for each computer for each feature and time period combination, and generating the first list of features to include each feature associated with the first feature scores that are greater than a first threshold. 4. The computer-implemented method of claim 1 , wherein implementing the traffic dispersion graphs includes creating a first plurality of data structures, one data structure for one traffic dispersion graph for each of a plurality of time periods, each traffic dispersion graph including a plurality of connected points illustrating communication between the computers and the websites, determining the computers for which the connected points form a giant connected component and the computers not in the giant connected component for each traffic dispersion graph, and generating the second list of features that includes which computers are not in the giant connected component on a repeated basis and further includes an average number of computers not in the giant connected component for each traffic dispersion graph. 5. The computer-implemented method of claim 1 , wherein implementing the autoencoder includes encoding original data of each entry and decoding the encoded data, calculating an error level between the original data and the decoded data for each entry of data, each entry including all of the features, and generating a first list of entries that includes the entries for which the error level is greater than a second threshold. 6. The computer-implemented method of claim 5 , wherein implementing the autoencoder includes implementing a plurality of decision trees to generate the third list of features to include the features which contributed most to the values of the error levels of the entries in the first list of entries from the autoencoder. 7. The computer-implemented method of claim 5 , wherein the third list of features includes a ranking of features which contributed most to the values of the error levels of the entries in the first list of entries. 8. A non-transitory computer-readable medium with an executable program stored thereon for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, wherein the program instructs a processing element of a computing device to perform the following: receiving log files including a plurality of entries of data regarding connections between a plurality of computers belonging to an organization and a plurality of websites outside the organization, each entry being associated with the actions of one computer; determining a plurality of embedded features that are included in each entry; determining a plurality of derived features that are extracted from the embedded features; creating a plurality of features including the embedded features and the derived features; executing a time series decomposition algorithm on a portion of the features of the data to generate a first list of features; implementing a plurality of traffic dispersion graphs to generate a second list of features; and implementing an autoencoder and a random forest regressor to generate a third list of features. 9. The non-transitory computer-readable medium of claim 8 , wherein the time series decomposition algorithm is executed on a portion of the features of the data to determine one or more outlying values for each computer for a portion of the features for each of a plurality of time periods. 10. The non-transitory computer-readable medium of claim 9 , wherein executing the time series decomposition algorithm includes calculating a first feature score for each computer for each feature and time period combination, and generating the first list of features to include each feature associated with the first feature scores that are greater than a first threshold. 11. The non-transitory computer-readable medium of claim 8 , wherein implementing the traffic dispersion graphs includes creating a first plurality of data structures, one data structure for one traffic dispersion graph for each of a plurality of time periods, each traffic dispersion graph including a plurality of connected points illustrating communication between the computers and the websites, determining the computers for which the connected points form a giant connected component and the computers not in the giant connected component for each traffic dispersion graph, and generating the second list of features that includes which computers are not in the giant connected component on a repeated basis and further includes an average number of computers not in the giant connected component for each traffic dispersion graph. 12. The non-transitory computer-readable medium of claim 8 , wherein implementing the autoencoder includes encoding original data of each entry and decoding the encoded data, calculating an error level between the original data and the decoded data for each entry of data, each entry including all of the features, and generating a first list of entries that includes the entries for which the error level is greater than a second threshold. 13. The non-transitory computer-readable medium of claim 12 , wherein implementing the autoencoder includes implementing a plurality of decision trees to generate the third list of features to include the features which contributed most to the values of the error levels of the entries in the first list of entries from the autoencoder. 14. The non-transitory computer-readable medium of claim 13 , wherein the third list of features includes a ranking of features which contributed most to the values of the error levels of the entries in the first list of entries. 15. A computing device for determining features of a dataset that are indicative of anomalous behavior of one or more computers in a large group of computers, the device comprising: a communication element configured to receive and transmit communications to and from a plurality of servers and computers within an organization; a memory element electronically coupled to the communication element, the
Traffic logging, e.g. anomaly detection · CPC title
Computing arrangements based on specific mathematical models · CPC title
Error or fault reporting or storing · CPC title
Combinations of networks · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.