What technology area does this patent fall under?

Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.

When was this patent published?

Publication date Thu Oct 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Determining string similarity using syntactic edit distance

US2016294852A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2016294852-A1
Application number	US-201514679757-A
Country	US
Kind code	A1
Filing date	Apr 6, 2015
Priority date	Apr 6, 2015
Publication date	Oct 6, 2016
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples relate to determining string similarity using syntactic edit distance. In one example, a computing device may receive domain name system (DNS) packets that were sent by a client device, each DNS packet specifying a domain name; generate, for each domain name, a syntax string by replacing each character of the domain name with one of a plurality of metacharacters, each metacharacter representing a category of characters that is different from each other category of characters represented by each other metacharacter; determine, for each domain name, a syntactic edit distance between the domain name and each other domain name, the syntactic edit distance between domain names being determined based on syntax strings of the corresponding domain names; cluster each domain name into one of a plurality of clusters based on the syntactic edit distances; and identify the client device as a potential source of malicious software based on the clusters.

First claim

Opening claim text (preview).

1 . A non-transitory machine-readable storage medium encoded with instructions executable by a hardware processor of a computing device for determining string similarity, the machine-readable storage medium comprising instructions to cause the hardware processor to: receive domain name system (DNS) query packets that were sent by a particular client computing device, each DNS query packet specifying a query domain name; generate, for each query domain name included in the received DNS query packets, a syntax string by replacing each character of the query domain name with one of a plurality of metacharacters, each of the plurality of metacharacters representing a category of characters that is different from each other category of characters represented by each other metacharacter in the plurality of metacharacters; determine, for each query domain name included in the received DNS query packets, a syntactic edit distance between the query domain name and each other query domain name included in the received DNS packets, the syntactic edit distance between query domain names being determined based on syntax strings of the corresponding domain names; cluster each query domain name included in the received DNS query packets into one of a plurality of clusters based on the syntactic edit distances; and identify the particular client computing device as a potential source of malicious software based on the plurality of clusters. 2 . The storage medium of claim 1 , wherein the instructions further cause the processor to: generate, for each syntax string, a sorted syntax string by sorting the metacharacters of each syntax string, and wherein the syntactic edit distance between query domain names is determined based on the sorted syntax strings of the corresponding domain names. 3 . The storage medium of claim 1 , wherein each syntactic edit distance between query domain names is determined based on an edit distance between syntax strings of the corresponding query domain names. 4 . The storage medium of claim 1 , wherein the particular client computing device is identified as a potential source of malicious software in response to determining that one of the plurality of clusters includes a number of query domain names that exceeds a threshold number of query domain names. 5 . The storage medium of claim 1 , wherein at least one category of characters represented by one of the plurality of metacharacters includes at least one of: alphabetical letters; lower-case letters; upper-case letters; vowel letters; consonant letters; foreign language characters; digits; punctuation marks; dashes; periods; underscores; or unprintable characters. 6 . A computing device for determining string similarity, the computing device comprising: a hardware processor; and a data storage device storing instructions that, when executed by the hardware processor, cause the hardware processor to: obtain, from at least one network egress point of a network, domain name system (DNS) query packets that were sent by at least one computing device operating on the network, each DNS query packet specifying a query domain name; generate, for each query domain name included in the DNS query packets, a syntax string by replacing a subset of the characters of the query domain name with one of a plurality of metacharacters, each of the plurality of metacharacters representing a category of characters that is different from each other category of characters represented by each other metacharacter in the plurality of metacharacters; determine, for each query domain name, a syntactic edit distance between the query domain name and each other query domain name included in the DNS query packets, the syntactic edit distance between the query domain name and each other domain name being determined based on the syntax string of the query domain name and each syntax string of each other domain name; cluster each of the query domain names into one of a plurality of domain name clusters based on the syntactic edit distances between the query domain names; and determine, based on the plurality of domain name clusters, use of a domain name generation algorithm by the at least one computing device operating on the network. 7 . The system of claim 6 wherein the instructions further cause the processor to: generate, for each syntax string, a sorted syntax string by sorting the metacharacters of each syntax string, and wherein the syntactic edit distance between query domain names is determined by: calculating an edit distance between sorted syntax strings of the corresponding domain names. 8 . The system of claim 6 , wherein each syntactic edit distance between query domain names is determined by: calculating an edit distance between syntax strings of the corresponding query domain names. 9 . The system of claim 8 , wherein the instructions further cause the processor to: determine, for each query domain name, a measure of similarity to each other query domain name, each measure of similarity being determined between a first domain name and a second domain name by: determining an edit distance between the first query domain name and the second query domain name; and calculating the measure of similarity between the first query domain name and the second query domain name based on the edit distance and the syntactic edit distance. 10 . The system of claim 6 , wherein use of the domain name generation algorithm is determined based on a number of query domain names in a particular cluster of the plurality of clusters relative to other numbers of query domain names in each of the other clusters of the plurality of clusters. 11 . A computer-implemented method for determining string similarity, implemented by a hardware processor, the method comprising executing on the hardware processor the steps of: receiving over a computer network a first string of characters and a second string of characters from domain name system (DNS) query packets originating from a particular computing device, the second string of characters being different from the first string of characters; generating a first syntax string by replacing each character of the first string with one of a plurality of metacharacters, each of the plurality of metacharacters representing a category of characters that is different from each other category of characters represented by each other metacharacter in the plurality of metacharacters; generating a second syntax string by replacing each character of the second string with one of the plurality of metacharacters; and generating network anomaly data for the particular computing device by determining a measure of similarity between the first string and the second string using a syntactic edit distance between the first string and the second string, the syntactic edit distance between first string and the second string being determined based on the first syntax string and second syntax string. 12 . The method of claim 11 , further comprising: identifying the particular computing device as a potential source of malicious software based on the measure of similarity between the first string and the second string. 13 . The method of claim 11 , further comprising: receiving a plurality of additional strings of characters originating from the particular computing device; generating, for each additional string, an additional syntax string by replacing each character of the additional string with one of the plurality of metacharacters; and determining, for each additional string, an additional measure of similarity between the additional string and each of

Assignees

Trend Micro Inc

Inventors

Hagen Josiah

Classifications

H04L61/1511
Electricity · mapped topic
H04L63/1416Primary
Event detection, e.g. attack signature detection · CPC title
H04L61/4511
using domain name system [DNS] · CPC title
H04L63/1425Primary
Traffic logging, e.g. anomaly detection · CPC title
H04L61/301
Name conversion · CPC title

Patent family

Related publications grouped by family.

View patent family 57015635

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016294852A1 cover?: Examples relate to determining string similarity using syntactic edit distance. In one example, a computing device may receive domain name system (DNS) packets that were sent by a client device, each DNS packet specifying a domain name; generate, for each domain name, a syntax string by replacing each character of the domain name with one of a plurality of metacharacters, each metacharacter rep…
Who is the assignee on this patent?: Trend Micro Inc
What technology area does this patent fall under?: Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?: Publication date Thu Oct 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).