Phishing detection using html
US-2024114053-A1 · Apr 4, 2024 · US
US12309197B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12309197-B2 |
| Application number | US-202218084593-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 20, 2022 |
| Priority date | Dec 20, 2022 |
| Publication date | May 20, 2025 |
| Grant date | May 20, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Website phishing detection is enabled using a Message Passing Neural Network (MPNN) that scores requested HTML with a likelihood of being a phishing website. The technique leverages the assumption that the HTML in a phishing website often presents anomalous structure or features when compared with an analogous benign website. Once a phishing site is detected, a given mitigation action is then taken.
Opening claim text (preview).
What I claim is as follows: 1. A method of protecting an online system, comprising: generating a Document Object Model (DOM) associated with a Hypertext Markup Language (HTML) page; generating from the DOM one or more directed graphs, wherein a directed graph (DG) represents an HTML feature in the HTML page; applying an encoded representation of the one or more directed graphs through a Message Passing Neural Network (MPNN), the MPNN having been trained by analyzing interactions between connected HTML nodes of sites in a training data set, to generate a likelihood that the HTML page is a phishing page; and upon a determination that the HTML page is likely a phishing page, taking a given mitigation action with respect to the HTML page. 2. The method as described in claim 1 wherein the MPNN analyzes the interactions using message passing. 3. The method as described in claim 2 wherein the HTML feature is one of: a hyperlink, HTML inner text, and a combination of a hyperlink and HTML inner text. 4. The method as described in claim 3 wherein the one or more directed graphs include a directed graph associated with a hyperlink, and a directed graph associated with HTML inner text. 5. The method as described in claim 1 wherein generating the one or more directed graphs from the DOM comprises: placing the DOM into a JavaScript Object Notation (JSON) file; parsing the JSON file into a tree data structure of defined HTML attributes; and processing the tree data structure into a directed graph. 6. The method as described in claim 5 wherein the processing of the tree data structure is carried out iteratively with respect to filtering criteria to generate the directed graphs. 7. The method as described in claim 6 wherein the filtering criteria comprise HTML tag names. 8. The method as described in claim 1 wherein the encoded representation is generated by a pretrained language encoder. 9. The method as described in claim 8 wherein the pretrained language encoder is BERT. 10. The method as described in claim 1 wherein applying the representation through the MPNN comprises, for each HTML feature: passing the representation through an input layer; passing an output from the input layer through a node representation learning layer; and passing an output from the node representation learning layer though a message-passing attention layer. 11. The method as described in claim 10 wherein the message passing attention layer is a multi-head self-attention layer followed by dense projections that feed a pooling layer. 12. The method as described in claim 11 wherein the multi-head self-attention layer and the pooling layer generate a vector graph representation. 13. The method as described in claim 12 wherein the vector graph representation is supplied to a fully-connected neural network that generates the likelihood that the HTML page is a phishing page. 14. The method as described in claim 1 wherein the determination occurs in real-time. 15. The method as described in claim 1 further including training the MPNN using the training data set, the training data set comprising a labeled corpus of benign and phishing sites. 16. An apparatus for protecting an online system, comprising: one or more hardware processors; and computer memory holding computer program code executed by the one or more hardware processors and configured to: generate a Document Object Model (DOM) associated with a Hypertext Markup Language (HTML) page; generate from the DOM one or more directed graphs, wherein a directed graph (DG) represents an HTML feature in the HTML page; apply an encoded representation of the one or more directed graphs through a Message Passing Neural Network (MPNN), the MPNN having been trained by analyzing interactions between connected HTML nodes of sites in a training data set, to generate a likelihood that the HTML page is a phishing page; and upon a determination that the HTML page is likely a phishing page, take a given mitigation action with respect to the HTML page. 17. The apparatus as described in claim 16 wherein the MPNN analyzes the interactions using message passing. 18. The apparatus as described in claim 17 wherein the HTML feature is one of: a hyperlink, HTML inner text, and a combination of a hyperlink and HTML inner text. 19. The apparatus as described in claim 16 wherein the computer program code is further executed by the one or more hardware processors to train the MPNN using the training data set, the training data set comprising a labeled corpus of benign and phishing sites. 20. A phishing detection system, comprising: a server executing in a hardware platform and configured to interoperate with a requesting client; and a back-end infrastructure configured to receive signaling forwarded by the server during an interaction with the requesting client with respect to a website, the back-end infrastructure comprising hardware and software configured in response to receipt of the signaling to: obtain an HTML page; generate a Document Object Model (DOM) associated with the HTML page; generate from the DOM one or more directed graphs, wherein a directed graph (DG) represents an HTML feature in the HTML page; apply a representation derived from the one or more graphs through a Message Passing Neural Network (MPNN), the MPNN having been trained by analyzing interactions between connected HTML nodes of sites in the DOM a training data set; generate a score indicative of whether the HTML page is a phishing page; and forward the score to the server for handling. 21. The phishing detection system as described in claim 20 wherein, in response to receipt of the score, the server issues a notification to the requesting client that the website is a phishing site.
Learning methods · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.