Identification of a dns packet as malicious based on a value
US-2018332056-A1 · Nov 15, 2018 · US
US11288594B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11288594-B2 |
| Application number | US-201815892088-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 8, 2018 |
| Priority date | Aug 31, 2015 |
| Publication date | Mar 29, 2022 |
| Grant date | Mar 29, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one example in accordance with the present disclosure, a method for domain classification includes sorting a set of sample domains into leaves based on syntactical features of the domains. Each sample domain belongs to a family of domains. The method also includes identifying, for each leaf, a regular expression for each family with at least one domain in the leaf. The method also includes determining, for each leaf, at least one lobe with a set of domains in the leaf that matches the regular expression for a first family with at least one domain in the leaf, and that does not match the regular expression for the other families with at least one domain in the leaf. The method also includes creating a classifier for the domains in each lobe by using the set of domains from each family in the lobe as training classes for machine learning.
Opening claim text (preview).
The invention claimed is: 1. A method for domain classification, the method comprising: sorting, by a processor, a set of sample domains into a plurality of leaves based on syntactical features of the sample domains, wherein each sample domain belongs to a family of domains; identifying, for each leaf of the plurality of leaves, a regular expression for each family of domains with at least one domain in the leaf; determining a plurality of lobes for each leaf of the plurality of leaves, at least one lobe of the plurality of lobes having a set of domains in the leaf that matches a regular expression for a first family of domains with at least one domain in the leaf, and that does not match a regular expression for other families of domains with at least one domain in the leaf; creating, by the processor, a classifier for each lobe of the plurality of lobes by using domains from each family of domains in the lobe as training classes for machine learning; receiving network traffic over a computer network; and analyzing the network traffic using a classifier for at least one lobe of the plurality of lobes to identify an algorithmically-generated domain employed by malware of an infected host on the computer network. 2. The method of claim 1 , wherein the syntactical features are defined by a 4-tuple of a top level domain, a length of a first private domain, a length of a prefix and a total number of levels below the top level domain. 3. The method of claim 1 , wherein, for each leaf of the plurality of leaves, a regular expression for each family of domains with at least one domain in the leaf codifies domains within the leaf that are from a particular family of domains. 4. The method of claim 1 further comprising: receiving, by the processor, an unclassified domain from the network traffic; determining, by the processor, a leaf that matches the unclassified domain; determining, by the processor, the at least one lobe that matches the unclassified domain; and applying, by the processor, the classifier for the at least one lobe to the unclassified domain. 5. The method of claim 1 , further comprising: calculating, by the processor, a probability that an unclassified domain from the network traffic belongs to a family of domains used to train the classifier for the at least one lobe. 6. The method of claim 1 , wherein at least one family of sample domains of the set of sample domains is designated as one of a malicious family or a benign family of domains. 7. The method of claim 1 , wherein at least one domain from the network traffic is classified as being benign. 8. The method of claim 1 , further comprising: determining for each leaf of the plurality of leaves, a union and an intersection of regular expressions of families of domains with at least one domain in the leaf. 9. A system for domain classification, the system comprising at least one processor and a memory, the memory storing instructions that when executed by the at least one processor cause the system to: determine a value for each domain in a set of sample domains based on syntactical features of the sample domains; create at least one leaf of domains, wherein all domains in the leaf have a same value; identify, for each leaf, a regular expression for each family of domains containing at least one domain in the leaf; determine, for each leaf, a plurality of lobes, at least one lobe of the plurality of lobes having of possible combinations of the regular expressions and a complement of regular expressions for families of domains compatible with at least one domain in the leaf; and create a classifier for each lobe of the plurality of lobes by using domains from each family of domains in the lobe as training classes for machine learning of the classifier to classify an unclassified domain as an algorithmically-generated domain used by a malware of an infected host on a computer network. 10. The system of claim 9 , wherein each family of sample domains of the set of sample domains has a set of possible values and each leaf consists of domains with values that are possible for the domains in the leaf. 11. The system of claim 9 , wherein the syntactical features are defined by a 4-tuple of a top level domain, a length of a first private domain, a length of a prefix and a total number of levels below the top level domain. 12. A non-transitory machine-readable storage medium comprising instructions executable by a processor of a computing device, the machine-readable storage medium comprising instructions to: sort a set of domains into a plurality of leaves based on syntactical features of the domains; identify each family of domains in each leaf, wherein at least one family of domains defines a set of domain generating algorithms; identify a regular expression for each family of domains in each leaf; determine, for each leaf, a plurality of lobes, at least one lobe of the plurality of lobes having regular expressions and a complement of the regular expressions for families of domains compatible with at least one domain in the leaf; and create a classifier for each lobe of the plurality of lobes by using domains from each family of domains in the lobe as training classes for machine learning of the classifier to classify an unclassified domain as an algorithmically-generated domain used by a malware on an infected host on a computer network. 13. The non-transitory machine-readable storage medium of claim 12 , wherein the syntactical features are defined by a 4-tuple of a top level domain, a length of a first private domain, a length of a prefix and a total number of levels below the top level domain. 14. The non-transitory machine-readable storage medium of claim 12 , further comprising instructions to: receive a test domain; determine a lobe of the plurality of lobes that matches the test domain; and apply a classifier for the determined lobe to the test domain.
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Address structures or formats · CPC title
by monitoring network traffic (monitoring network traffic per se H04L43/00) · CPC title
Event detection, e.g. attack signature detection · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.