Session slicing of mirrored packets
US-12184680-B2 · Dec 31, 2024 · US
US2016294852A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016294852-A1 |
| Application number | US-201514679757-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 6, 2015 |
| Priority date | Apr 6, 2015 |
| Publication date | Oct 6, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Examples relate to determining string similarity using syntactic edit distance. In one example, a computing device may receive domain name system (DNS) packets that were sent by a client device, each DNS packet specifying a domain name; generate, for each domain name, a syntax string by replacing each character of the domain name with one of a plurality of metacharacters, each metacharacter representing a category of characters that is different from each other category of characters represented by each other metacharacter; determine, for each domain name, a syntactic edit distance between the domain name and each other domain name, the syntactic edit distance between domain names being determined based on syntax strings of the corresponding domain names; cluster each domain name into one of a plurality of clusters based on the syntactic edit distances; and identify the client device as a potential source of malicious software based on the clusters.
Opening claim text (preview).
1 . A non-transitory machine-readable storage medium encoded with instructions executable by a hardware processor of a computing device for determining string similarity, the machine-readable storage medium comprising instructions to cause the hardware processor to: receive domain name system (DNS) query packets that were sent by a particular client computing device, each DNS query packet specifying a query domain name; generate, for each query domain name included in the received DNS query packets, a syntax string by replacing each character of the query domain name with one of a plurality of metacharacters, each of the plurality of metacharacters representing a category of characters that is different from each other category of characters represented by each other metacharacter in the plurality of metacharacters; determine, for each query domain name included in the received DNS query packets, a syntactic edit distance between the query domain name and each other query domain name included in the received DNS packets, the syntactic edit distance between query domain names being determined based on syntax strings of the corresponding domain names; cluster each query domain name included in the received DNS query packets into one of a plurality of clusters based on the syntactic edit distances; and identify the particular client computing device as a potential source of malicious software based on the plurality of clusters. 2 . The storage medium of claim 1 , wherein the instructions further cause the processor to: generate, for each syntax string, a sorted syntax string by sorting the metacharacters of each syntax string, and wherein the syntactic edit distance between query domain names is determined based on the sorted syntax strings of the corresponding domain names. 3 . The storage medium of claim 1 , wherein each syntactic edit distance between query domain names is determined based on an edit distance between syntax strings of the corresponding query domain names. 4 . The storage medium of claim 1 , wherein the particular client computing device is identified as a potential source of malicious software in response to determining that one of the plurality of clusters includes a number of query domain names that exceeds a threshold number of query domain names. 5 . The storage medium of claim 1 , wherein at least one category of characters represented by one of the plurality of metacharacters includes at least one of: alphabetical letters; lower-case letters; upper-case letters; vowel letters; consonant letters; foreign language characters; digits; punctuation marks; dashes; periods; underscores; or unprintable characters. 6 . A computing device for determining string similarity, the computing device comprising: a hardware processor; and a data storage device storing instructions that, when executed by the hardware processor, cause the hardware processor to: obtain, from at least one network egress point of a network, domain name system (DNS) query packets that were sent by at least one computing device operating on the network, each DNS query packet specifying a query domain name; generate, for each query domain name included in the DNS query packets, a syntax string by replacing a subset of the characters of the query domain name with one of a plurality of metacharacters, each of the plurality of metacharacters representing a category of characters that is different from each other category of characters represented by each other metacharacter in the plurality of metacharacters; determine, for each query domain name, a syntactic edit distance between the query domain name and each other query domain name included in the DNS query packets, the syntactic edit distance between the query domain name and each other domain name being determined based on the syntax string of the query domain name and each syntax string of each other domain name; cluster each of the query domain names into one of a plurality of domain name clusters based on the syntactic edit distances between the query domain names; and determine, based on the plurality of domain name clusters, use of a domain name generation algorithm by the at least one computing device operating on the network. 7 . The system of claim 6 wherein the instructions further cause the processor to: generate, for each syntax string, a sorted syntax string by sorting the metacharacters of each syntax string, and wherein the syntactic edit distance between query domain names is determined by: calculating an edit distance between sorted syntax strings of the corresponding domain names. 8 . The system of claim 6 , wherein each syntactic edit distance between query domain names is determined by: calculating an edit distance between syntax strings of the corresponding query domain names. 9 . The system of claim 8 , wherein the instructions further cause the processor to: determine, for each query domain name, a measure of similarity to each other query domain name, each measure of similarity being determined between a first domain name and a second domain name by: determining an edit distance between the first query domain name and the second query domain name; and calculating the measure of similarity between the first query domain name and the second query domain name based on the edit distance and the syntactic edit distance. 10 . The system of claim 6 , wherein use of the domain name generation algorithm is determined based on a number of query domain names in a particular cluster of the plurality of clusters relative to other numbers of query domain names in each of the other clusters of the plurality of clusters. 11 . A computer-implemented method for determining string similarity, implemented by a hardware processor, the method comprising executing on the hardware processor the steps of: receiving over a computer network a first string of characters and a second string of characters from domain name system (DNS) query packets originating from a particular computing device, the second string of characters being different from the first string of characters; generating a first syntax string by replacing each character of the first string with one of a plurality of metacharacters, each of the plurality of metacharacters representing a category of characters that is different from each other category of characters represented by each other metacharacter in the plurality of metacharacters; generating a second syntax string by replacing each character of the second string with one of the plurality of metacharacters; and generating network anomaly data for the particular computing device by determining a measure of similarity between the first string and the second string using a syntactic edit distance between the first string and the second string, the syntactic edit distance between first string and the second string being determined based on the first syntax string and second syntax string. 12 . The method of claim 11 , further comprising: identifying the particular computing device as a potential source of malicious software based on the measure of similarity between the first string and the second string. 13 . The method of claim 11 , further comprising: receiving a plurality of additional strings of characters originating from the particular computing device; generating, for each additional string, an additional syntax string by replacing each character of the additional string with one of the plurality of metacharacters; and determining, for each additional string, an additional measure of similarity between the additional string and each of
Electricity · mapped topic
Event detection, e.g. attack signature detection · CPC title
using domain name system [DNS] · CPC title
Traffic logging, e.g. anomaly detection · CPC title
Name conversion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.