Apparatus method and medium for tracing the origin of network transmissions using N-gram distribution of data

US9003528B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9003528-B2
Application numberUS-201213550711-A
CountryUS
Kind codeB2
Filing dateJul 17, 2012
Priority dateNov 12, 2003
Publication dateApr 7, 2015
Grant dateApr 7, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, apparatus, and medium are provided for tracing the origin of network transmissions. Connection records are maintained at computer system for storing source and destination addresses. The connection records also maintain a statistical distribution of data corresponding to the data payload being transmitted. The statistical distribution can be compared to that of the connection records in order to identify the sender. The location of the sender can subsequently be determined from the source address stored in the connection record. The process can be repeated multiple times until the location of the original sender has been traced.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of tracing the location of an origin computer system that initially transmits a suspect data payload across a computer network to an end target computer system, the method comprising: creating, using a hardware processor, a connection record for a transmission to a first computer system through the computer network of a plurality of computer systems; generating, using the hardware processor, a byte value statistical distribution of data contained in a data payload corresponding to the connection record; calculating, using the hardware processor, a distance between the byte value statistical distribution of data contained in the data payload and a model distribution representative of normal payloads transmitted through the computer network; identifying, using the hardware processor, the data payload as a suspect data payload based on the calculated distance; setting, using the hardware processor, the first computer system as a suspect computer system; upon determining at least one byte value statistical distribution that is similar to the byte value statistical distribution of the data contained in the suspect data payload, determining, using the hardware processor, address information associated with the at least one byte value statistical distribution; and setting, using the hardware processor, a second computer system associated with the address information as the suspect computer system. 2. The method of claim 1 , further comprising selecting the model statistical distribution from a plurality of model byte frequency statistical distributions based at least in part on a length of the data contained in the data payload. 3. The method of claim 1 , wherein the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload are byte frequency count. 4. The method of claim 1 , wherein the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload are rank ordered byte frequency count. 5. The method of claim 1 , wherein determining at least one byte value distribution that is similar to the byte value statistical distribution of the data contained in the suspect data payload further comprises: measuring a distance metric between the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload; and determining that the at least one byte value statistical distribution is similar to the byte value statistical distribution of the data contained in the suspect data payload based at least in part on comparing the distance metric to a predetermined distance. 6. The method of claim 5 , wherein the distance metric is calculated based on a Mahalanobis distance between the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload. 7. The method of claim 1 , further comprising assigning different weight factors to selected byte values of the byte value statistical distribution of the data contained in the suspect data payload. 8. The method of claim 7 , wherein higher weight factors are assigned to byte values corresponding to operational codes of a computer system. 9. A system for tracing the location of an origin computer system that initially transmits a suspect data payload across a computer network to an end target computer system, the system comprising: a processor that: creates a connection record for a transmission to a first computer system through the computer network of a plurality of computer systems; generates a byte value statistical distribution of data contained in a data payload corresponding to the connection record; identifies the data payload as a suspect data payload based on differences detected between the byte value statistical distribution of data contained in the suspect data payload and a model statistical distribution representative of normal payloads transmitted through the computer network; calculates a distance between the byte value statistical distribution of data contained in the data payload and a model distribution representative of normal payloads transmitted through the computer network; identifies the data payload as a suspect data payload based on the calculated distance; sets the first computer system as a suspect computer system; upon determining at least one byte value statistical distribution that is similar to the byte value statistical distribution of the data contained in the suspect data payload, determines address information associated with the at least one byte value statistical distribution; and sets a second computer system associated with the address information as the suspect computer system. 10. The system of claim 9 , wherein the processor is further configured to select the model statistical distribution from a plurality of model byte frequency statistical distributions based at least in part on a length of the data contained in the data payload. 11. The system of claim 9 , wherein the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload are byte frequency count. 12. The system of claim 9 , wherein the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload are rank ordered byte frequency count. 13. The system of claim 9 , wherein the processor is further configured to: measure a distance metric between the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload; and determine that the at least one byte value statistical distribution is similar to the byte value statistical distribution of the data contained in the suspect data payload based at least in part on comparing the distance metric to a predetermined distance. 14. The system of claim 13 , wherein the distance metric is calculated based on a Mahalanobis distance between the at least one byte value statistical distribution and the byte value statistical distribution of the data contained in the suspect data payload. 15. The system of claim 9 , wherein the processor is further configured to assign different weight factors to selected byte values of the byte value statistical distribution of the data contained in the suspect data payload. 16. The system of claim 15 , wherein higher weight factors are assigned to byte values corresponding to operational codes of a computer system. 17. A non-transitory computer-readable medium containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for tracing the location of an origin computer system that initially transmits a suspect data payload across a computer network to an end target computer system, the method comprising: creating a connection record for a transmission to a first computer system through the computer network of a plurality of computer systems; generating a byte value statistical distribution of data contained in a data payload corresponding to the connection record; calculating a distance between the byte value statistical distribution of data contained in the data payload and a model distribution representative of normal payloads transmitted through the computer network; identifying the data payload as a suspect data payload b

Assignees

Inventors

Classifications

  • G06F21/552Primary

    involving long-term monitoring or reporting · CPC title

  • Electricity · mapped topic

  • Static detection · CPC title

  • the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title

  • by virus signature recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9003528B2 cover?
A method, apparatus, and medium are provided for tracing the origin of network transmissions. Connection records are maintained at computer system for storing source and destination addresses. The connection records also maintain a statistical distribution of data corresponding to the data payload being transmitted. The statistical distribution can be compared to that of the connection records …
Who is the assignee on this patent?
Stolfo Salvatore J, Univ Columbia
What technology area does this patent fall under?
Primary CPC classification G06F21/552. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).