System and method to provide automatic classification of phishing sites

US9282117B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9282117-B2
Application numberUS-201313949974-A
CountryUS
Kind codeB2
Filing dateJul 24, 2013
Priority dateJul 24, 2012
Publication dateMar 8, 2016
Grant dateMar 8, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A phishing classification model that detects a phishing website based on one or more feature vectors for the website is provided. The phishing classification model may operate on a server and may further select a website, generate a feature vector for a landing page of the website, create a feature vector for every iframe that is a descendent of the landing page, and derive a final feature vector from the feature vectors of the landing page and the descendent iframe pages. Further, machine learning techniques may be applied to generate, or train, a classification model based upon one or more known phishing websites. Based on the feature vector, the classification modeler may classify a website as either a phishing website or as a non-phishing website. Feedback in the form of human verification may further be incorporated.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: using a device, creating one or more feature vectors for a landing page of a website, wherein the one or more feature vectors for the landing page are derived from one or more landing page elements; creating one or more feature vectors for one or more child pages that are a descendant of the landing page; deriving a final feature vector from the one or more feature vectors of the landing page and the one or more feature vectors for the child pages; and providing the final feature vector to a model to determine whether the website is a phishing website. 2. The method of claim 1 , further comprising: inputting the final feature vector into a model, wherein the model outputs a score associated with a probability of being a phishing site given the input; and classifying the website as a phishing website based on the determined score. 3. The method of claim 2 , further comprising: classifying the website as a phishing website given the score and a threshold. 4. The method of claim 2 , wherein the final feature vector includes a concatenation of at least some of the following individual feature vectors: a uniform resource locator (URL) feature vector including at least some of a URL string character n-gram, an IP address character n-gram, and URL geo-location information; an average URL feature vector derived from links and hrefs on page; average URL feature vectors derived from links and hrefs on page in bins of similarity to the page URL feature vector; an html content feature vector; a classification service classification result feature vector; and a feature vector based on age of webpage. 5. The method of claim 2 , wherein the model utilizes active learning to compute a priority in which the feature vector should be labeled. 6. The method of claim 2 , wherein the model utilizes one or more labels to identify whether the website is a phishing website or not a phishing website. 7. The method of claim 2 , wherein the model utilizes transductive learning. 8. The method of claim 2 , further comprising: an output score indicating an entity that is targeted by the phishing website. 9. The method of claim 1 , wherein the feature vector is derived according to the following formula: p ⇀ = ( p ⇀ 00 , 1 n 1 ⁢ ∑ k n 1 ⁢ ⁢ p ⇀ 1 ⁢ k , 1 n 11 ⁢ ∑ { k ❘ k ∈ ⁢ bin 11 } n 11 ⁢ ⁢ p ⇀ 1 ⁢ k , … ⁢ , 1 n 1 ⁢ m ⁢ ∑ { k ❘ k ∈ ⁢ bin 1 ⁢ m n

Assignees

Inventors

Classifications

  • service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title

  • Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

  • Assessing vulnerabilities and evaluating computer system security · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9282117B2 cover?
A phishing classification model that detects a phishing website based on one or more feature vectors for the website is provided. The phishing classification model may operate on a server and may further select a website, generate a feature vector for a landing page of the website, create a feature vector for every iframe that is a descendent of the landing page, and derive a final feature vect…
Who is the assignee on this patent?
Webroot Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1483. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).