Web threat investigation using advanced web crawling

US12452259B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12452259-B2
Application numberUS-202117549313-A
CountryUS
Kind codeB2
Filing dateDec 13, 2021
Priority dateJun 28, 2018
Publication dateOct 21, 2025
Grant dateOct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples of the present disclosure describe systems and methods for evaluating malicious web content for associated threats using specialized web crawling techniques. A seed resource identifier is evaluated to determine a second resource identifier associated with the seed resource identifier. A resource corresponding to the second resource identifier is scanned to identify a third resource identifier. The third resource identifier is processed with a machine learning model to classify the third resource identifier according to a classification representing a predicted level of threat. The machine learning model trained to classify resource identifiers into a plurality of classifications. A corrective action can be executed based on the classification of the third resource identifier.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a seed resource identifier; determining a related resource identifier associated with the seed resource identifier; evaluating the related resource identifier to determine a classification of the related resource identifier, evaluating the related resource identifier comprising: determining a third resource identifier associated with the related resource identifier, wherein determining the third resource identifier comprises scanning a related resource corresponding to the related resource identifier to determine a resource made available via the related resource identifier in a webpage corresponding to the related resource identifier; and processing the third resource identifier with a machine learning model to classify the third resource identifier according to a classification representing a predicted level of threat, the machine learning model trained to classify resource identifiers into a plurality of classifications, the plurality of classifications comprising: a first category for safe resource identifiers; and a plurality of additional categories, the plurality of additional categories representing different levels of threat; classifying the related resource identifier based on a classification of the third resource identifier; and executing a corrective action based on the classification of the related resource identifier, wherein executing the corrective action comprises modifying at least one of a permission or a privilege level. 2. The method of claim 1 , further comprising classifying the related resource identifier as malicious based on the classification of the third resource identifier. 3. The method of claim 1 , further comprising: based on a determination that the third resource identifier is classified as malicious, providing the third resource identifier to a web crawler to identify further resource identifiers associated with the third resource identifier. 4. The method of claim 1 , wherein evaluating the related resource identifier comprises providing the related resource identifier to a web crawler. 5. The method of claim 1 , wherein the corrective action comprises quarantining a file. 6. The method of claim 1 , wherein the corrective action comprises initiating anti-exploit processing. 7. The method of claim 1 , wherein the corrective action comprises terminating an executing process. 8. The method of claim 1 , wherein the corrective action comprises installing a security patch. 9. The computer-implemented method of claim 1 , wherein determining the related resource identifier comprises investigating at least one of: a root domain and sub-domain of the seed resource identifier, internal and external links associated with the seed resource identifier, an IP address hosting the seed resource identifier, a geolocation of an IP address associated with the seed resource identifier, or other domains owned by a resource. 10. A non-transitory computer-readable media storing computer-executable instructions, the computer-executable instructions comprising instructions for: receiving a seed resource identifier; determining a related resource identifier associated with the seed resource identifier; evaluating the related resource identifier to determine a classification of the related resource identifier, evaluating the related resource identifier comprising: determining a third resource identifier associated with the related resource identifier, wherein determining the third resource identifier comprises scanning a related resource corresponding to the related resource identifier to determine a resource made available via the related resource identifier in a webpage corresponding to the related resource identifier; and processing the third resource identifier with a machine learning model to classify the third resource identifier according to a classification representing a predicted level of threat, the machine learning model trained to classify resource identifiers into a plurality of classifications, the plurality of classifications comprising: a first category for safe resource identifiers; and a plurality of additional categories, the plurality of additional categories representing different levels of threat; classifying the related resource identifier based on a classification of the third resource identifier; and executing a corrective action based on the classification of the related resource identifier, wherein executing the corrective action comprises modifying at least one of a permission or a privilege level. 11. The non-transitory computer-readable media of claim 10 , further comprising classifying the related resource identifier as malicious based on the classification of the third resource identifier. 12. The non-transitory computer-readable media of claim 10 , further comprising instructions for: based on a determination that the third resource identifier is classified as malicious, providing the third resource identifier to a web crawler to identify further resource identifiers associated with the third resource identifier. 13. The non-transitory computer-readable media of claim 10 , wherein evaluating the related resource identifier comprises providing the related resource identifier to a web crawler. 14. The non-transitory computer-readable media of claim 10 , wherein the corrective action comprises quarantining a file. 15. The non-transitory computer-readable media of claim 10 , wherein the corrective action comprises initiating anti-exploit processing. 16. The non-transitory computer-readable media of claim 10 , wherein the corrective action comprises terminating an executing process. 17. The non-transitory computer-readable media of claim 10 , wherein the corrective action comprises installing a security patch. 18. The non-transitory computer-readable media of claim 10 , wherein determining the related resource identifier comprises investigating at least one of: a root domain and sub-domain of the seed resource identifier, internal and external links associated with the seed resource identifier, an IP address hosting the seed resource identifier, a geolocation of an IP address associated with the seed resource identifier, or other domains owned by a resource.

Assignees

Inventors

Classifications

  • Indexing; Web crawling techniques · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • using information identifiers, e.g. uniform resource locators [URL] · CPC title

  • Event detection, e.g. attack signature detection · CPC title

  • service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12452259B2 cover?
Examples of the present disclosure describe systems and methods for evaluating malicious web content for associated threats using specialized web crawling techniques. A seed resource identifier is evaluated to determine a second resource identifier associated with the seed resource identifier. A resource corresponding to the second resource identifier is scanned to identify a third resource ide…
Who is the assignee on this patent?
Open Text Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).