Spam classification system based on network flow data
US-2017359362-A1 · Dec 14, 2017 · US
US10375090B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10375090-B2 |
| Application number | US-201715469716-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 27, 2017 |
| Priority date | Mar 27, 2017 |
| Publication date | Aug 6, 2019 |
| Grant date | Aug 6, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a device in a network receives telemetry data regarding a traffic flow in the network. One or more features in the telemetry data are individually compressed. The device extracts the one or more individually compressed features from the received telemetry data. The device performs a lookup of one or more classifier inputs from an index of classifier inputs using the one or more individually compressed features from the received telemetry data. The device classifies the traffic flow by inputting the one or more classifier inputs to a machine learning-based classifier.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, at a device in a network, telemetry data regarding a traffic flow in the network, wherein each of a plurality of features in the telemetry data are individually compressed so that a separate data compression context is maintained for each of the plurality of features in the telemetry data; extracting, by the device, the plurality of individually compressed features from the received telemetry data; performing, by the device, a lookup of one or more classifier inputs from an index of classifier inputs using at least one of the plurality of individually compressed features from the received telemetry data; and classifying, by the device, the traffic flow by inputting the one or more classifier inputs to a machine learning-based classifier. 2. The method as in claim 1 , wherein classifying the traffic flow comprises: determining, by the device, an application associated with the traffic flow. 3. The method as in claim 1 , wherein classifying the traffic flow comprises: determining, by the device, whether the traffic flow is associated with malware. 4. The method as in claim 1 , wherein the plurality of individually compressed features in the telemetry data comprises at least one of: sequence of packet lengths and time (SPLT) data regarding the traffic flow, sequence of application lengths and time (SALT) data regarding the traffic flow, byte distribution (BD) data regarding the traffic flow, a ciphersuite, or a Transport Layer Security (TLS) extension. 5. The method as in claim 1 , wherein the received telemetry data comprises a NetFlow or Internet Protocol Flow Information Export (IPFIX) record. 6. The method as in claim 1 , wherein a particular one of the individually compressed one or more features in the telemetry data references a previously observed feature in the network. 7. The method as in claim 1 , wherein a particular one of the individually compressed one or more features in the telemetry data is compressed using Lempel-Ziv compression. 8. The method as in claim 1 , wherein the machine learning-based classifier comprises a random forest classifier or a regression-based classifier. 9. An apparatus, comprising: one or more network interfaces to communicate with a network; a processor coupled to the one or more network interfaces and configured to execute a process; and a memory configured to store the process executable by the processor, the process when executed configured to: receive telemetry data regarding a traffic flow in the network, wherein each of a plurality of features in the telemetry data are individually compressed so that a separate data compression context is maintained for each of the plurality of features in the telemetry data; extract the plurality of individually compressed features from the received telemetry data; perform a lookup of one or more classifier inputs from an index of classifier inputs using at least one of the plurality of individually compressed features from the received telemetry data; and classify the traffic flow by inputting the one or more classifier inputs to a machine learning-based classifier. 10. The apparatus as in claim 9 , wherein the apparatus classifies the traffic flow by: determining an application associated with the traffic flow. 11. The apparatus as in claim 9 , wherein the apparatus classifies the traffic flow by: determining whether the traffic flow is associated with malware. 12. The apparatus as in claim 9 , wherein the plurality of individually compressed features in the telemetry data comprises at least one of: sequence of packet lengths and time (SPLT) data regarding the traffic flow, sequence of application lengths and time (SALT) data regarding the traffic flow, byte distribution (BD) data regarding the traffic flow, a ciphersuite, or a Transport Layer Security (TLS) extension. 13. The apparatus as in claim 9 , wherein the received telemetry data comprises a NetFlow or Internet Protocol Flow Information Export (IPFIX) record. 14. The apparatus as in claim 9 , wherein a particular one of the individually compressed one or more features in the telemetry data references a previously observed feature in the network. 15. The apparatus as in claim 9 , wherein a particular one of the individually compressed one or more features in the telemetry data is compressed using Lempel-Ziv compression. 16. The apparatus as in claim 9 , wherein the machine learning-based classifier comprises a random forest classifier or a regression-based classifier. 17. A tangible, non-transitory, computer-readable medium storing program instructions that cause a device in a network to execute a process comprising: receiving, at the device, telemetry data regarding a traffic flow in the network, wherein each of a plurality of features in the telemetry data are individually compressed so that a separate data compression context is maintained for each of the plurality of features in the telemetry data; extracting, by the device, the plurality of individually compressed features from the received telemetry data; performing, by the device, a lookup of one or more classifier inputs from an index of classifier inputs using at least one of the plurality of individually compressed features from the received telemetry data; and classifying, by the device, the traffic flow by inputting the one or more classifier inputs to a machine learning-based classifier. 18. The computer-readable medium as in claim 17 , wherein classifying the traffic flow comprises: determining, by the device, an application associated with the traffic flow or whether the traffic flow is associated with malware. 19. The computer-readable medium as in claim 17 , wherein the plurality individually compressed features in the telemetry data comprises at least one of: sequence of packet lengths and time (SPLT) data regarding the traffic flow, sequence of application lengths and time (SALT) data regarding the traffic flow, byte distribution (BD) data regarding the traffic flow, a ciphersuite, or a Transport Layer Security (TLS) extension. 20. The computer-readable medium as in claim 17 , wherein the received telemetry data comprises a NetFlow or Internet Protocol Flow Information Export (IPFIX) record.
Machine learning · CPC title
the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title
relying on flow classification, e.g. using integrated services [IntServ] · CPC title
Event detection, e.g. attack signature detection · CPC title
by monitoring network traffic (monitoring network traffic per se H04L43/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.