Risk information output device, information output system, risk information output method, and recording medium
US-2024414180-A1 · Dec 12, 2024 · US
US2025337763A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025337763-A1 |
| Application number | US-202519192671-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 29, 2025 |
| Priority date | Apr 30, 2024 |
| Publication date | Oct 30, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
HyperText Markup Language (HTML) content analysis (HCA) using machine learning is described. A feature vector schema may be generated based on domain names corresponding to HTML webpages and corresponding indications of a status of the HTML webpage. The schema may map each position in a feature vector of a given HTML webpage to a resource identifier. Information may be processed using the schema to generate respective feature vectors. The feature vectors may be used to train a model to generate risk indicators for HTML webpages. A potentially parked domain webpage or a potentially malicious domain webpage may be received. A feature vector for the webpage may be generated and inputted to the model. The model may generate a risk indicator for the webpage. The risk indicator may be output and may cause responsive actions. The model may be updated based on a determination indicating whether the webpage was a parked domain webpage or a malicious domain webpage.
Opening claim text (preview).
1 . A computing device for HTML content analysis, wherein the computer device comprises: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the computing device to: receive a training set comprising a plurality of training records, wherein each training record comprises, respectively: a domain name corresponding to an HTML webpage; and an indication of a previous determination as to whether the corresponding HTML webpage corresponds to a malicious HTML webpage; generate a feature vector schema for the training set, wherein the feature vector schema corresponds to network assets referenced in the training set, by: parsing the HTML webpage for each domain name of the training set to generate a set of resource identifiers of network assets referenced in the HTML webpages of the training set, wherein parsing a given HTML webpage comprises: extracting resource identifiers of each network asset referenced in the given HTML webpage; and generating the set of resource identifiers based on the extracted resource identifiers of each network asset referenced in the given HTML webpage; and generating the feature vector schema for the training set based on the generated set of resource identifiers of network assets referenced in the HTML webpages, wherein the feature vector schema maps each position in a feature vector of a given HTML webpage to a corresponding resource identifier of the set of resource identifiers; process each training record of the training set, using the feature vector schema, to generate a feature vector corresponding to the HTML webpage for each respective domain name of the training set; train a content analysis model based on inputting, into the content analysis model and for each respective HTML webpage of the training set: the feature vector of the respective HTML webpage; and the corresponding indication of the previous determination as to whether the corresponding HTML webpage corresponds to a malicious HTML webpage; receive a request to perform content analysis on a potentially malicious HTML webpage; generate, based on the request, a feature vector for the potentially malicious HTML webpage, by processing the potentially malicious HTML webpage using the feature vector schema; generate, based on inputting the feature vector for the potentially malicious HTML webpage into the content analysis model, a risk indicator, wherein the risk indicator corresponds to a likelihood that the potentially malicious HTML webpage corresponds to a malicious HTML webpage; cause output of the risk indicator; receive, based on the output of the risk indicator, feedback corresponding to the accuracy of the risk indicator; provide the feature vector for the potentially malicious HTML webpage and the feedback to the content analysis model as a new training record; and update the content analysis model based on the new training record. 2 . The computing device of claim 1 , wherein processing a given training record comprises: generating the feature vector for the given training record, wherein the feature vector for the given training record comprises one or more binary bits indicating the presence of resource identifiers, of the set of resource identifiers, in the HTML webpage for each respective domain name by: determining, based on the feature vector schema and for each position of the feature vector for the given training record, whether the HTML webpage includes a resource identifier corresponding to the resource identifier mapped to the respective position; and assigning, based on the determining, a binary value to each position of the feature vector for the given training record. 3 . The computing device of claim 1 , wherein processing the potentially malicious HTML webpage comprises: extracting resource identifiers corresponding to each network asset referenced in the potentially malicious HTML webpage; determining, based on the feature vector schema and for each position of the feature vector for the potentially malicious HTML webpage, whether the potentially malicious HTML webpage includes a resource identifier corresponding to the resource identifier mapped to the respective position; and assigning, based on the determining, a binary value to each position of the feature vector for the potentially malicious HTML webpage. 4 . The computing device of claim 1 , wherein generating the feature vector schema comprises: determining, by parsing the set of resource identifiers, whether the set of resource identifiers includes one or more duplicate resource identifiers, wherein the one or more duplicate resource identifiers are each identical to a first resource identifier; and based on determining the set of resource identifiers includes one or more duplicate resource identifiers, removing, from the set of resource identifiers, each of the one or more duplicate resource identifiers before mapping each position in the feature vector of the given HTML webpage to the corresponding resource identifier of the set of resource identifiers. 5 . The computing device of claim 1 , wherein generating the feature vector schema comprises: determining, by parsing the set of resource identifiers, whether the set of resource identifiers includes one or more resource identifiers sharing a same domain name subpart; and based on determining the set of resource identifiers includes one or more resource identifiers sharing the same domain name subpart, mapping, for two given resource identifiers sharing the same domain name subpart, the two given resource identifiers sharing the same domain name subpart to the same position in the feature vector of the given HTML webpage. 6 . The computing device of claim 1 , wherein generating the feature vector schema comprises: determining, by parsing the set of resource identifiers, whether the set of resource identifiers includes one or more alias resource identifiers, wherein a given alias resource identifier corresponds to a known resource identifier included in the set of resource identifiers; and based on determining the set of resource identifiers includes one or more alias resource identifiers, mapping the given alias resource identifier and the corresponding known resource identifier to the same position in the feature vector of the given HTML webpage. 7 . The computing device of claim 1 , wherein the receiving the request to perform content analysis is based on monitoring network traffic of a computing device, wherein the monitoring comprises: identifying a list of HTML webpage domain names included in the network traffic; and comparing the list of HTML webpage domain names with a watchlist of potentially malicious domain names. 8 . The computing device of claim 1 , wherein the receiving the request to perform content analysis is based on determining a given HTML webpage exceeds a risk threshold value, wherein the determining comprises: receiving a set of threat information comprising a plurality of threat records maintained by a cybersecurity application, wherein each threat record comprises: a domain name corresponding to a tracked HTML webpage; and a confidence score associated with the domain name corresponding to the tracked HTML webpage, wherein the confidence score indicates a likelihood that the tracked HTML webpage corresponds to a malicious HTML webpage; receiving an identification of a first HTML webpage; determining, based on comparing a domain name corresponding to the first HTML webpage to the set of threat information, whether or not the domain name corresponding to the first HTML webpage is included in the set of threat information; and determining, based on determining that the domain name correspondi
service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title
Traffic logging, e.g. anomaly detection · CPC title
Event detection, e.g. attack signature detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.