Domain classification

US2018165607A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018165607-A1
Application numberUS-201815892088-A
CountryUS
Kind codeA1
Filing dateFeb 8, 2018
Priority dateAug 31, 2015
Publication dateJun 14, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one example in accordance with the present disclosure, a method for domain classification includes sorting a set of sample domains into leaves based on syntactical features of the domains. Each sample domain belongs to a family of domains. The method also includes identifying, for each leaf, a regular expression for each family with at least one domain in the leaf. The method also includes determining, for each leaf, at least one lobe with a set of domains in the leaf that matches the regular expression for a first family with at least one domain in the leaf, and that does not match the regular expression for the other families with at least one domain in the leaf. The method also includes creating a classifier for the domains in each lobe by using the set of domains from each family in the lobe as training classes for machine learning.

First claim

Opening claim text (preview).

1 . A method for domain classification, the method comprising: sorting, by a processor, a set of sample domains into leaves based on syntactical features of the domains, wherein each sample domain belongs to a family of domains; identifying, for each leaf, a regular expression for each family with at least one domain in the leaf; determining, for each leaf, at least one lobe with a set of domains in the leaf that matches the regular expression for a first family with at least one domain in the leaf, and that does not match the regular expression for the other families with at least one domain in the leaf; and creating, by the processor, a classifier for the domains in each lobe by using the set of domains from each family in the lobe as training classes for machine learning. 2 . The method of claim 1 wherein the syntactical features are defined by a 4-tuple of a top level domain, a length of a first private domain, a length of a prefix and a total number of levels below the top level domain. 3 . The method of claim 1 wherein the regular expression codify domains within a leaf that are from a particular family 4 . The method of claim 1 further comprising: receiving, by the processor, an unclassified domain; determining, by the processor, the leaf that matches the unclassified domain; determining, by the processor, the lobe that matches the unclassified domain; and applying, by the processor, the classifier for the determined lobe to the unclassified domain. 5 . The method of claim 1 further comprising: calculating, by the processor, a probability that an unclassified domain belongs to a family of domains. 6 . The method of claim 1 wherein at least one family is designated as one of a malicious family or a benign family. 7 . The method of claim 1 wherein at least one domain is classified as being benign. 8 . The method of claim 1 further comprising: determining a union and an intersection of the regular expressions. 9 . The method of claim 1 wherein at least one domain is classified as having been generated by a known domain generation algorithm. 10 . A system for domain classification comprising: a value determiner to determine a value for each domain in a set of sample domains based on syntactical features of the domains; a leaf creator to create at least one leaf of domains, wherein each domain in the leaf has a same value; a regex identifier to identify, for each leaf, a regular expression for each family containing at least one domain in the leaf; a lobe determiner to determine, for each leaf, at least one lobe of possible combinations of the regular expressions and a complement of regular expressions for families compatible with at least one domain in the leaf; and a classifier creator to create a classifier for the domains in each set in each lobe by using the set of domains from each family in the lobe as training classes for machine learning. 11 . The system of claim 10 wherein each family has a set of possible values and each leaf consists of domains with values are possible for the families in the leaf. 12 . The system of claim 10 wherein the syntactical features are defined by a 4-tuple of a top level domain, a length of a first private domain, a length of a prefix and a total number of levels below the top level domain. 13 . A non-transitory machine-readable storage medium comprising instructions executable by a processor of a computing device for application launch state determination, the machine-readable storage medium comprising instructions to: sort a set of domains into leaves based on syntactical features of the domains; identify each family of domains in each leaf, wherein at least one family defines a set of domain generating algorithms; identify a regular expression for each family; determine, for each leaf, a lobe of possible combinations of the regular expressions and a complement of the regular expressions for families compatible with at least one domain in the leaf; and create a classifier for the domains in each lobe by using the set of domains from each family in the lobe as training classes for machine learning. 14 . The non-transitory machine-readable storage medium of claim 12 , wherein the syntactical features are defined by a 4-tuple of a top level domain, a length of a first private domain, a length of a prefix and a total number of levels below the top level domain. 15 . The non-transitory machine-readable storage medium of claim 12 further comprising instructions to: receive a test domain; determine the lobe that matches the test domain; and apply the classifier for the determined lobe to the test domain.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Electricity · mapped topic

  • Domain name generation or assignment · CPC title

  • Physics · mapped topic

  • G06N99/005Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018165607A1 cover?
In one example in accordance with the present disclosure, a method for domain classification includes sorting a set of sample domains into leaves based on syntactical features of the domains. Each sample domain belongs to a family of domains. The method also includes identifying, for each leaf, a regular expression for each family with at least one domain in the leaf. The method also includes d…
Who is the assignee on this patent?
Trend Micro Inc
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).