System and method for detecting generated domain

US10764246B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10764246-B2
Application numberUS-201816220360-A
CountryUS
Kind codeB2
Filing dateDec 14, 2018
Priority dateAug 14, 2018
Publication dateSep 1, 2020
Grant dateSep 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for domain analysis comprises: obtaining, by a computing device, a domain; and inputting, by the computing device, the obtained domain to a trained detection model to determine if the obtained domain was generated by one or more domain generation algorithms. The detection model comprises a neural network model, a n-gram-based machine learning model, and an ensemble layer. Inputting the obtained domain to the detection model comprises inputting the obtained domain to each of the neural network model and the n-gram-based machine learning model. The neural network model and the n-gram-based machine learning model both output to the ensemble layer. The ensemble layer outputs a probability that the obtained domain was generated by the domain generation algorithms.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for domain analysis, comprising: obtaining, by a computing device, a domain; and inputting, by the computing device, the obtained domain to a trained detection model to determine if the obtained domain was generated by one or more domain generation algorithms, wherein: the detection model comprises a neural network model, a n-gram-based machine learning model, and an ensemble layer; inputting the obtained domain to the detection model comprises inputting the obtained domain to each of the neural network model and the n-gram-based machine learning model; the neural network model and the n-gram-based machine learning model both output to the ensemble layer; and the ensemble layer outputs a probability that the obtained domain was generated by the domain generation algorithms. 2. The method of claim 1 , wherein: obtaining, by the computing device, the domain comprises obtaining, by the computing device, the domain from a log of a local Domain Name Service (DNS) server; and the method further comprises forwarding, by the computing device, the determination to the local DNS server to block queries of the domain. 3. The method of claim 1 , wherein: obtaining, by the computing device, the domain comprises obtaining, by the computing device, the domain from an agent software installed on a client device; and the method further comprises forwarding, by the computing device, the determination to the agent software to block communications with an Internet Protocol (IP) address of the domain. 4. The method of claim 1 , wherein: obtaining, by the computing device, the domain comprises obtaining, by the computing device, the domain from a log of a network monitoring server; and the method further comprises forwarding, by the computing device, the determination to the network monitoring server to block queries of the domain. 5. The method of claim 1 , wherein: the detection model comprises an extra feature layer; inputting the obtained domain to the detection model comprises inputting the obtained domain to the extra feature layer; the extra feature layer outputs to the ensemble layer; the domain is associated with a domain name and a top-level domain (TLD); and the extra feature layer comprises at least of the following features: a length of the domain name, a length of the TLD, whether the length of the domain name exceeds a domain name threshold, whether the length of the TLD exceeds a TLD threshold, a number of numerical characters in the domain name, whether the TLD contains any numerical character, a number of special characters contained in the domain name, or whether the TLD contains any special character. 6. The method of claim 5 , wherein: the ensemble layer comprises a top logistic regression model outputting the probability; the top logistic regression model comprises a plurality of ensemble coefficients respectively associated with the features, the output from the neural network model, and the output from the n-gram-based machine learning model; and the detection model is trained by: training the neural network model and the n-gram-based machine learning model separately; and inputting outputs of the trained neural network model and the trained n-gram-based machine learning model to the top logistic regression model to solve the ensemble coefficients. 7. The method of claim 1 , wherein: the neural network model comprises a probability network; the domain is associated with a domain name, a top-level domain (TLD), and a domain length as separate inputs to the probability network; the domain name is inputted to a one-hot encoding layer and a recurrent neural network layer, before being inputted to a dense and batch normalization layer; the TLD is inputted to an embedding and batch normalization layer, before being inputted to the dense and batch normalization layer; the domain length is inputted to the dense and batch normalization layer; and the dense and batch normalization layer outputs a predicted probability that the obtained domain was generated by the domain generation algorithms. 8. The method of claim 7 , wherein: the recurrent neural network layer comprises long-short term memory (LSTM) units. 9. The method of claim 1 , wherein: the neural network model comprises a representation network; the domain is associated with a domain name and a top-level domain (TLD) as separate inputs to the representation network; the domain name is inputted to an embedding and batch normalization layer and a recurrent neural network layer, before being inputted to a dense and batch normalization layer; the TLD is inputted to an embedding and batch normalization layer, before being inputted to the dense and batch normalization layer; and the dense and batch normalization layer outputs a dense representation of the domain. 10. The method of claim 9 , wherein: the recurrent neural network layer comprises gated recurrent units (GRU). 11. The method of claim 1 , wherein: the n-gram-based machine learning model comprises a gradient boosting based classifier based on bigram features. 12. The method of claim 1 , wherein: the obtained domain comprises one or more Chinese Pinyin elements. 13. A system for domain analysis, comprising a processor and a non-transitory computer-readable storage medium storing instructions that, when executed by the processor, cause the processor to perform a method for domain analysis, the method comprising: obtaining a domain; and inputting the obtained domain to a trained detection model to determine if the obtained domain was generated by one or more domain generation algorithms, wherein: the detection model comprises a neural network model, a n-gram-based machine learning model, and an ensemble layer; inputting the obtained domain to the detection model comprises inputting the obtained domain to each of the neural network model and the n-gram-based machine learning model; the neural network model and the n-gram-based machine learning model both output to the ensemble layer; and the ensemble layer outputs a probability that the obtained domain was generated by the domain generation algorithms. 14. The system of claim 13 , wherein: obtaining the domain comprises obtaining the domain from a log of a local Domain Name Service (DNS) server; and the method further comprises forwarding the determination to the local DNS server to block queries of the domain. 15. The system of claim 13 , wherein: obtaining the domain comprises obtaining the domain from an agent software installed on a client device; and the method further comprises forwarding the determination to the agent software to block communications with an Internet Protocol (IP) address of the domain. 16. The system of claim 13 , wherein: obtaining the domain comprises obtaining the domain from a log of a network monitoring server; and the method further comprises forwarding the determination to the network monitoring server to block queries of the domain. 17. The system of claim 13 , wherein: the detection model comprises an extra feature layer; inputting the obtained domain to the detection model comprises inputting the obtained domain to the extra feature layer; the extra feature layer outputs to the ensemble layer; the domain is associated with a domain name and a top-level domain (TLD); and the extra feature layer comprises at least of the following features: a length of the domain name, a length of the TLD, whether the length of the domain name exceeds a domai

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Domain name generation or assignment · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10764246B2 cover?
A computer-implemented method for domain analysis comprises: obtaining, by a computing device, a domain; and inputting, by the computing device, the obtained domain to a trained detection model to determine if the obtained domain was generated by one or more domain generation algorithms. The detection model comprises a neural network model, a n-gram-based machine learning model, and an ensemble…
Who is the assignee on this patent?
Didi Res America Llc
What technology area does this patent fall under?
Primary CPC classification H04L61/3025. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).