What technology area does this patent fall under?

Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.

When was this patent published?

Publication date Thu Oct 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Hypertext markup language (html) content analysis using machine learning

US2025337763A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025337763-A1
Application number	US-202519192671-A
Country	US
Kind code	A1
Filing date	Apr 29, 2025
Priority date	Apr 30, 2024
Publication date	Oct 30, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

HyperText Markup Language (HTML) content analysis (HCA) using machine learning is described. A feature vector schema may be generated based on domain names corresponding to HTML webpages and corresponding indications of a status of the HTML webpage. The schema may map each position in a feature vector of a given HTML webpage to a resource identifier. Information may be processed using the schema to generate respective feature vectors. The feature vectors may be used to train a model to generate risk indicators for HTML webpages. A potentially parked domain webpage or a potentially malicious domain webpage may be received. A feature vector for the webpage may be generated and inputted to the model. The model may generate a risk indicator for the webpage. The risk indicator may be output and may cause responsive actions. The model may be updated based on a determination indicating whether the webpage was a parked domain webpage or a malicious domain webpage.

First claim

Opening claim text (preview).

1 . A computing device for HTML content analysis, wherein the computer device comprises: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the computing device to: receive a training set comprising a plurality of training records, wherein each training record comprises, respectively: a domain name corresponding to an HTML webpage; and an indication of a previous determination as to whether the corresponding HTML webpage corresponds to a malicious HTML webpage; generate a feature vector schema for the training set, wherein the feature vector schema corresponds to network assets referenced in the training set, by: parsing the HTML webpage for each domain name of the training set to generate a set of resource identifiers of network assets referenced in the HTML webpages of the training set, wherein parsing a given HTML webpage comprises: extracting resource identifiers of each network asset referenced in the given HTML webpage; and generating the set of resource identifiers based on the extracted resource identifiers of each network asset referenced in the given HTML webpage; and generating the feature vector schema for the training set based on the generated set of resource identifiers of network assets referenced in the HTML webpages, wherein the feature vector schema maps each position in a feature vector of a given HTML webpage to a corresponding resource identifier of the set of resource identifiers; process each training record of the training set, using the feature vector schema, to generate a feature vector corresponding to the HTML webpage for each respective domain name of the training set; train a content analysis model based on inputting, into the content analysis model and for each respective HTML webpage of the training set: the feature vector of the respective HTML webpage; and the corresponding indication of the previous determination as to whether the corresponding HTML webpage corresponds to a malicious HTML webpage; receive a request to perform content analysis on a potentially malicious HTML webpage; generate, based on the request, a feature vector for the potentially malicious HTML webpage, by processing the potentially malicious HTML webpage using the feature vector schema; generate, based on inputting the feature vector for the potentially malicious HTML webpage into the content analysis model, a risk indicator, wherein the risk indicator corresponds to a likelihood that the potentially malicious HTML webpage corresponds to a malicious HTML webpage; cause output of the risk indicator; receive, based on the output of the risk indicator, feedback corresponding to the accuracy of the risk indicator; provide the feature vector for the potentially malicious HTML webpage and the feedback to the content analysis model as a new training record; and update the content analysis model based on the new training record. 2 . The computing device of claim 1 , wherein processing a given training record comprises: generating the feature vector for the given training record, wherein the feature vector for the given training record comprises one or more binary bits indicating the presence of resource identifiers, of the set of resource identifiers, in the HTML webpage for each respective domain name by: determining, based on the feature vector schema and for each position of the feature vector for the given training record, whether the HTML webpage includes a resource identifier corresponding to the resource identifier mapped to the respective position; and assigning, based on the determining, a binary value to each position of the feature vector for the given training record. 3 . The computing device of claim 1 , wherein processing the potentially malicious HTML webpage comprises: extracting resource identifiers corresponding to each network asset referenced in the potentially malicious HTML webpage; determining, based on the feature vector schema and for each position of the feature vector for the potentially malicious HTML webpage, whether the potentially malicious HTML webpage includes a resource identifier corresponding to the resource identifier mapped to the respective position; and assigning, based on the determining, a binary value to each position of the feature vector for the potentially malicious HTML webpage. 4 . The computing device of claim 1 , wherein generating the feature vector schema comprises: determining, by parsing the set of resource identifiers, whether the set of resource identifiers includes one or more duplicate resource identifiers, wherein the one or more duplicate resource identifiers are each identical to a first resource identifier; and based on determining the set of resource identifiers includes one or more duplicate resource identifiers, removing, from the set of resource identifiers, each of the one or more duplicate resource identifiers before mapping each position in the feature vector of the given HTML webpage to the corresponding resource identifier of the set of resource identifiers. 5 . The computing device of claim 1 , wherein generating the feature vector schema comprises: determining, by parsing the set of resource identifiers, whether the set of resource identifiers includes one or more resource identifiers sharing a same domain name subpart; and based on determining the set of resource identifiers includes one or more resource identifiers sharing the same domain name subpart, mapping, for two given resource identifiers sharing the same domain name subpart, the two given resource identifiers sharing the same domain name subpart to the same position in the feature vector of the given HTML webpage. 6 . The computing device of claim 1 , wherein generating the feature vector schema comprises: determining, by parsing the set of resource identifiers, whether the set of resource identifiers includes one or more alias resource identifiers, wherein a given alias resource identifier corresponds to a known resource identifier included in the set of resource identifiers; and based on determining the set of resource identifiers includes one or more alias resource identifiers, mapping the given alias resource identifier and the corresponding known resource identifier to the same position in the feature vector of the given HTML webpage. 7 . The computing device of claim 1 , wherein the receiving the request to perform content analysis is based on monitoring network traffic of a computing device, wherein the monitoring comprises: identifying a list of HTML webpage domain names included in the network traffic; and comparing the list of HTML webpage domain names with a watchlist of potentially malicious domain names. 8 . The computing device of claim 1 , wherein the receiving the request to perform content analysis is based on determining a given HTML webpage exceeds a risk threshold value, wherein the determining comprises: receiving a set of threat information comprising a plurality of threat records maintained by a cybersecurity application, wherein each threat record comprises: a domain name corresponding to a tracked HTML webpage; and a confidence score associated with the domain name corresponding to the tracked HTML webpage, wherein the confidence score indicates a likelihood that the tracked HTML webpage corresponds to a malicious HTML webpage; receiving an identification of a first HTML webpage; determining, based on comparing a domain name corresponding to the first HTML webpage to the set of threat information, whether or not the domain name corresponding to the first HTML webpage is included in the set of threat information; and determining, based on determining that the domain name correspondi

Assignees

Centripetal Networks Llc

Inventors

Classifications

H04L63/1483
service impersonation, e.g. phishing, pharming or web spoofing (detection of rogue wireless access points H04W12/12) · CPC title
H04L63/1425Primary
Traffic logging, e.g. anomaly detection · CPC title
H04L63/1416Primary
Event detection, e.g. attack signature detection · CPC title

Patent family

Related publications grouped by family.

View patent family 97303858

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025337763A1 cover?: HyperText Markup Language (HTML) content analysis (HCA) using machine learning is described. A feature vector schema may be generated based on domain names corresponding to HTML webpages and corresponding indications of a status of the HTML webpage. The schema may map each position in a feature vector of a given HTML webpage to a resource identifier. Information may be processed using the schem…
Who is the assignee on this patent?: Centripetal Networks Llc
What technology area does this patent fall under?: Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?: Publication date Thu Oct 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).