Classifying locator generation kits

US10205704B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10205704-B2
Application numberUS-201615200530-A
CountryUS
Kind codeB2
Filing dateJul 1, 2016
Priority dateJul 1, 2016
Publication dateFeb 12, 2019
Grant dateFeb 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for classifying malicious locators. A processor is trained on a set of known malicious locators using a non-supervised learning procedure. Once trained, the processor may classify new locators as being generated by a particular generation kit.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for classifying malicious locators accessible through a network, the method comprising: accessing, through an interface to a computer-readable medium, a plurality of locators, wherein each locator comprises the location of a malicious network-accessible resource; extracting at least one feature from each of the plurality of locators, wherein the at least one extracted feature is organized into a binary tree and selected based on a minimum calculated gini entropy; assigning a membership probability to each of the plurality of locators, the membership probability representing a probability a locator was generated by a specific family, wherein different families are generated by different kits; labeling each of the plurality of locators as being generated by a specific family and kit combination based on the at least one extracted feature and the assigned membership probability; providing the at least one extracted feature and the family and kit combination label for each of the plurality of locators to a classification module to train the classification module; and applying the classification module to a second locator to determine a family and kit source of the second locator. 2. The method of claim 1 , wherein at least one locator is a uniform resource locator (URL). 3. The method of claim 1 , wherein labeling each of the plurality of locators as being generated by a specific family and kit combination includes labeling each of the plurality of locators as being generated by a specific URL-generation kit. 4. The method of claim 1 , wherein the label assigned to each of the plurality of locators is based on a highest membership probability for each of the plurality of locators. 5. The method of claim 1 , wherein the at least one feature includes one or more of locator string length, character frequency distribution, domain levels, number of directories, number of words, number of words from a predetermined list of words, number of vowels, and number of consonants in the locator. 6. The method of claim 1 , further comprising producing weights from the classification module related to each of the at least one feature to assist in determining a family and kit combination for the second locator. 7. The method of claim 1 , further comprising issuing a message indicating the family and kit combination of the second locator. 8. The method of claim 1 , further comprising classifying the second locator as malicious or non-malicious. 9. A system for classifying malicious locators accessible through a network, the system comprising: an interface to a computer-readable medium configured to access a plurality of locators, each of the plurality of locators comprising the location of a malicious network-accessible resource; a network interface; and a processor in communication with the medium interface and the network interface, the processor configured to: extract at least one feature from each of the plurality of locators, wherein the at least one extracted feature is organized into a binary tree and selected based on a minimum calculated gini entropy; assign a membership probability to each of the plurality of locators, the membership probability representing a probability a locator was generated by a specific family, wherein different families are generated by different kits; label each of the plurality of locators as being generated by a specific family and kit combination based on the at least one extracted feature and the assigned membership probability; and provide the at least one extracted feature and the family and kit combination label for each of the plurality of locators to a classification module to train the classification module so the classification module can determine a family and kit source of a second locator. 10. The system of claim 9 , wherein the locator is a uniform resource locator (URL). 11. The system of claim 9 , wherein the processor is configured to label each of the plurality of locators as being generated by a specific URL-generation kit. 12. The system of claim 9 , wherein the label assigned to each of the plurality of locators is based on a highest membership probability for each of the plurality of locators. 13. The system of claim 9 , wherein the at least one feature includes one or more of locator string length, character frequency distribution, domain levels, number of directories, number of words, number of words from a predetermined list of words, number of vowels, and number of consonants in the locator. 14. The system of claim 9 , wherein the processor is configured to produce weights related to each of the at least one feature to assist in determining a family and kit combination for the second locator. 15. The system of claim 9 , wherein the processor is configured to issue a message indicating the family and kit combination of the second locator. 16. The system of claim 9 , wherein the processor is configured to classify the second locator as malicious or non-malicious. 17. The system of claim 9 , wherein the processor is further configured to assign weights to the second locator to determine a family the second locator belongs to and further configured to determine a locator generation kit that generated the second locator based on the family. 18. A computer readable medium containing computer-executable instructions for performing a method for classifying malicious locators accessible through a network, the medium comprising: computer-executable instructions for accessing, through an interface to a computer-readable medium, a plurality of locators, wherein each locator comprises the location of a malicious network-accessible resource; computer-executable instructions for extracting at least one feature from each of the plurality of locators, wherein the at least one extracted feature is organized into a binary tree and selected based on a minimum calculated gini entropy; computer-executable instructions for assigning a membership probability to each of the plurality of locators, the membership probability representing a probability a locator was generated by a specific family, wherein different families are generated by different kits; computer-executable instructions for labeling each of the plurality of locators as being generated by a specific family and kit combination based on the at least one extracted feature and the assigned membership probability; computer-executable instructions for providing the at least one extracted feature and the family and kit combination label for each of the plurality of locators to a classification module to train the classification module; and computer-executable instructions for applying the classification module to a second locator to determine a family and kit source of the second locator.

Assignees

Inventors

Classifications

  • Filtering by address, protocol, port number or service, e.g. IP-address or URL · CPC title

  • Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10205704B2 cover?
Methods and systems for classifying malicious locators. A processor is trained on a set of known malicious locators using a non-supervised learning procedure. Once trained, the processor may classify new locators as being generated by a particular generation kit.
Who is the assignee on this patent?
Rapid7 Inc, Rapid 7 Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/0236. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Feb 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).