Statistical modeling of email senders to detect business email compromise
US-2024356969-A1 · Oct 24, 2024 · US
US12586068B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586068-B2 |
| Application number | US-202418428270-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 31, 2024 |
| Priority date | Jan 31, 2024 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for detecting email manipulation using machine learning are disclosed. In some embodiments, a disclosed method includes: receiving, from a computing device, an email assessment request regarding an email address associated with a user; generating feature data based on the email address; computing, using at least one machine learning model, manipulation indication data of the email address based on the feature data; and transmitting, in response to the email assessment request, the manipulation indication data of the email address to the computing device.
Opening claim text (preview).
What is claimed is: 1 . A system, comprising: a non-transitory memory having instructions stored thereon; and at least one processor operatively coupled to the non-transitory memory, wherein the instructions, when executed, cause the at least one processor to: receive, from a computing device, an email assessment request regarding an email address associated with a user; generate feature data based on the email address, wherein the feature data comprises numeric features related to the email address; execute at least one machine learning model; input the feature data into the executed at least one machine learning model and generate manipulation indication data of the email address based on the inputted feature data, wherein the manipulation indication data comprises a manipulation type indicator indicating a type of manipulation for the email address, and wherein the at least one machine learning model is trained with labelled features generated from a plurality of email addresses to minimize a difference between outputted labels and labels in the labelled features; and transmit, in response to the email assessment request, the manipulation indication data of the email address to the computing device. 2 . The system of claim 1 , wherein: the manipulation type indicator indicates the email address is an invalid email in accordance with a determination that the email address has an invalid format; the manipulation type indicator indicates the email address is a tumbling email in accordance with a determination that the email address is one of multiple emails that are associated with the user and sharing a similar pattern; and the manipulation type indicator indicates the email address is a gibberish email in accordance with a determination that the email address has gibberish characters. 3 . The system of claim 1 , wherein the manipulation indication data is computed based on: processing the email address to generate processed data; performing a syntax check based on the processed data using at least one rule-based logic, wherein the at least one rule-based logic is determined based on an email provider or a domain portion of the email address; and determining, based on the syntax check, whether the email address has a valid format. 4 . The system of claim 3 , wherein the manipulation indication data is computed further based on: performing a tumbling check on the email address to determine whether the email address is one of multiple tumbling emails associated with the user, in accordance with a determination that the email address has a valid format; performing a gibberish check on the email address to determine whether the email address is a gibberish email containing gibberish characters, in accordance with a determination that the email address has a valid format; performing a label adjustment based on results of the tumbling check and the gibberish check; and generating the manipulation indication data of the email address based on results of the label adjustment and the syntax check. 5 . The system of claim 4 , wherein performing the tumbling check on the email address comprises: generating a plurality of feature sets based on the email address using natural language processing; applying a plurality of rules on the plurality of feature sets to generate a rule-based label indicating whether the email address is a tumbling email, wherein: each of plurality of rules is applied on a respective one of the plurality of feature sets to compute a respective tumbling score based on a respective threshold, the respective threshold is pre-determined based on a quantile analysis on a feature distribution associated with emails of historical and current users, the rule-based label is generated based on all tumbling scores computed by applying the plurality of rules, the plurality of rules comprise: (a) at least one population-based rule related to the email address and other email addresses, and (b) at least one population-free rule related to the email address alone. 6 . The system of claim 5 , wherein the at least one machine learning model comprises a deep learning model, and wherein performing the tumbling check on the email address further comprises: generating at least one numeric feature based on the email address; transforming the email address to a gene matrix representing an evolutionary process that leads to the email address, wherein the gene matrix is generated based on an image visualization showing patterns and clusters of historical tumbling emails; inputting the at least one numeric feature and the gene matrix to the deep learning model to generate a model-based label indicating whether the email address is a tumbling email; and determining the email address as a tumbling email in accordance with a determination that at least one of the rule-based label or model-based label indicates the email address is a tumbling email. 7 . The system of claim 6 , wherein the deep learning model is trained based on: obtaining a plurality of emails associated with a plurality of users; automatically generating, for each respective email address of the plurality of emails, a pseudo-label indicating whether the respective email address is tumbling or not; performing a label adjustment on the generated pseudo-labels based on feedbacks from business unit, customer service and/or expert review to generate training labels; generating gene matrices and numeric features for the plurality of emails, to form labelled training data with training labels; and training the deep learning model based on the labelled training data. 8 . The system of claim 4 , wherein performing the gibberish check on the email address comprises: determining a vocabulary based on letters, digits and special characters; extracting a username string from the email address; processing the username string to generate a processed username; computing, using a machine learning model, a perplexity score for the processed username based on padding, smoothing, and the vocabulary, wherein the perplexity score indicates a probability that the email address is gibberish; and generating a gibberish label indicating whether the email address is a gibberish email based on the perplexity score. 9 . The system of claim 4 , wherein performing the label adjustment comprises: obtaining at least one adjustment rule that is determined based on a clustering logic, a popular username logic, and/or predetermined email domain lists; applying the at least one adjustment rule to the result of the tumbling check to confirm or adjust a label indicating whether the email address is a tumbling email; and applying the at least one adjustment rule to the result of the gibberish check to confirm or adjust a label indicating whether the email address is a gibberish email. 10 . A computer-implemented method by one or more processors, the method comprising: receiving, by the one or more processors and from a computing device, an email assessment request regarding an email address associated with a user; generating, by the one or more processors, feature data based on the email address, wherein the feature data comprises numeric features related to the email address; executing, by the one or more processors, at least one machine learning model; inputting, by the one or more processors, the feature data into the executed at least one machine learning model, and generating manipulation indication data of the email address based on the feature data, wherein the manipulation indication data comprises a manipulation type indicator indicating a type of manipulation for the email address, and wherein the at least one mac
Computer-aided management of electronic mailing [e-mailing] · CPC title
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
Transaction verification · CPC title
Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.