Web threat investigation using advanced web crawling

US11201875B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11201875-B2
Application numberUS-201816021630-A
CountryUS
Kind codeB2
Filing dateJun 28, 2018
Priority dateJun 28, 2018
Publication dateDec 14, 2021
Grant dateDec 14, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples of the present disclosure describe systems and methods for evaluating malicious web content for associated threats using specialized web crawling techniques. In aspects, a first set of malicious and/or potentially malicious resource identifiers is identified. The first set of resource identifiers is evaluated to determine at least a second set of resource identifiers associated with the first set of resource identifiers. The second set of resource identifiers are provide to a web crawling component, which scans the second set of resource identifiers using a threat detection component. If any resource identifiers in the second set of resource identifiers are identified as malicious (or potentially malicious), those resource identifiers may be classified and recorded, provided to the web crawling component, and/or added to the first set of resource identifiers in subsequent threat detection analyses.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one processor; and memory coupled to the at least one processor, the memory comprising computer executable instructions that, when executed by the at least one processor, performs a method comprising: accessing a first set of resource identifiers; determining a second set of resource identifiers associated with the first set of resource identifiers, wherein determining the second set of resource identifiers comprises evaluating one or more domains of the first set of resource identifiers; evaluating the second set of resource identifiers to determine a third set of resource identifiers identified using the second set of resource identifiers; evaluating the third set of resource identifiers using a machine learning model to determine whether one or more resource identifiers in the third set of resource identifiers is malicious and applying a classification from a plurality of classifications to each of the one or more resource identifiers in the third set of resource identifiers, wherein evaluating the third set of resource identifiers comprises arranging the third set of resource identifiers into one or more classifications, wherein the classifications represent one or more web threat levels; and based on a first classification applied to a first resource identifier from the third set of resource identifiers by the machine learning model, selecting a remedial action for the first resource identifier from a plurality of remedial actions and executing the remedial action, the plurality of remedial actions comprising: notifying a threat monitoring authority of the first resource identifier; and modifying a set of permission or privilege levels. 2. The system of claim 1 , wherein the first set of resource identifiers comprises one or more resource identifiers identified as at least one of malicious and suspicious. 3. The system of claim 1 , wherein determining the second set of resource identifiers comprises investigating at least one of: root domains and sub-domains of the first set of resource identifiers, internal and external links associated with the first set of resource identifiers, one or more IP addresses hosting the first set of resource identifiers, a geolocation of one or more IP addresses associated with the first set of resource identifiers, and other domains owned by a resource. 4. The system of claim 3 , wherein determining the second set of resource identifiers further comprises at least one of: determining domain registration for the first set of resource identifiers, identifying common execution paths for web threats, and comparing web threat execution paths for various IP addresses associated with the first set of resource identifiers. 5. The system of claim 1 , wherein evaluating the second set of resource identifiers comprises providing the second set of resource identifiers to a web crawling utility. 6. The system of claim 5 , wherein the web crawling utility scans the second set of resource identifiers to identify one or more links embedded in the second set of resource identifiers. 7. The system of claim 1 , wherein evaluating the third set of resource identifiers comprises applying to the third set of resource identifiers at least one of a rule set, an evaluation model, and an algorithm. 8. The system of claim 1 , wherein evaluating the third set of resource identifiers further comprises generating a set of metrics identifying at least one of: a number of resource identifiers evaluated, a number of unique resource identifiers evaluated, and a number of threats detected. 9. The system of claim 1 , wherein the method further comprises: based on a determination that the first resource identifier is malicious, generating a set of instructions for performing one or more remedial actions. 10. The system of claim 1 , wherein the plurality of remedial actions further comprises at least one of: displaying a warning, generating a report, blocking access to the first resource identifier, and initiating an anti-exploit utility. 11. A method comprising: determining, by a computer system, a first set of resource identifiers; determining, by the computer system, a second set of resource identifiers associated with the first set of resource identifiers, wherein determining the second set of resource identifiers comprises evaluating one or more domains of the first set of resource identifiers; evaluating, by the computer system, the second set of resource identifiers to determine a third set of resource identifiers identified using the second set of resource identifiers; evaluating, by the computer system, the third set of resource identifiers using a machine learning model to determine whether one or more resource identifiers in the third set of resource identifiers is malicious and applying a classification from a plurality of classifications to each of the one or more resource identifiers in the third set of resource identifiers, wherein evaluating the third set of resource identifiers comprises arranging the third set of resource identifiers into one or more classifications, wherein the classifications represent one or more web threat levels; and based on a first classification applied to a first resource identifier from the third set of resource identifiers by the machine learning model, selecting a remedial action for the first resource identifier from a plurality of remedial actions and executing the remedial action, the plurality of remedial actions comprising: notifying a threat monitoring authority of the first resource identifier; and modifying a set of permission or privilege levels. 12. The method of claim 11 , wherein the first set of resource identifiers comprises one or more resource identifiers previously identified as associated with malicious content. 13. The method of claim 11 , wherein determining the second set of resource identifiers further comprises evaluating at least one of: links associated with the first set of resource identifiers, an IP addresses hosting the first set of resource identifiers, one or more IP addresses associated with the IP addresses hosting the first set of resource identifiers, and a geolocation of one or more IP addresses. 14. The method of claim 13 , wherein the second set of resource identifiers is determined using an investigation utility for scanning resource identifiers. 15. The method of claim 11 , wherein evaluating the second set of resource identifiers comprises providing the second set of resource identifiers to a web crawling utility. 16. The method of claim 15 , wherein the web crawling utility is operable to scan the second set of resource identifiers to identify one or more links associated with the second set of resource identifiers. 17. The method of claim 11 , wherein evaluating the third set of resource identifiers comprises providing the third set of resource identifiers to a threat detection utility. 18. The method of claim 17 , wherein the threat detection utility is operable to determine whether one or more resource identifiers of the third set of resource identifiers is at least suspicious. 19. A non-transitory, computer-readable media storing computer executable instructions that when executed cause a computing system to perform a method comprising: determining a first set of resource identifiers; determining a second set of resource identifiers associated with the first set of resource identifiers, wherein determining the second set of resource identifiers comprises evaluating one or more dom

Assignees

Inventors

Classifications

  • service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title

  • Event detection, e.g. attack signature detection · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • using information identifiers, e.g. uniform resource locators [URL] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11201875B2 cover?
Examples of the present disclosure describe systems and methods for evaluating malicious web content for associated threats using specialized web crawling techniques. In aspects, a first set of malicious and/or potentially malicious resource identifiers is identified. The first set of resource identifiers is evaluated to determine at least a second set of resource identifiers associated with th…
Who is the assignee on this patent?
Webroot Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1483. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Dec 14 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).