Identifying patterns in large quantities of collected emails

US12488037B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12488037-B2
Application numberUS-202318326919-A
CountryUS
Kind codeB2
Filing dateMay 31, 2023
Priority dateMay 31, 2023
Publication dateDec 2, 2025
Grant dateDec 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method of detecting malicious activity in emails using pattern recognition. The method includes maintaining a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails. Each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email. The method includes identifying, based on one or more keywords, a set of MD vectors of the plurality of MD vectors. The method includes selecting, based on the plurality of associations, a set of emails associated with the set of MD vectors. The method includes generating, by a processing device, based on the set of emails or the set of MD vectors, a set of clusters to represent patterns in the set of emails.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving a request for one or more emails associated with one or more keywords; maintaining a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails, each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email; identifying, based on the one or more keywords, a set of MD vectors of the plurality of MD vectors; selecting, based on the plurality of associations, a set of emails associated with the set of MD vectors; extracting, from the set of emails, a first portion of the set of emails and a second portion of the set of emails; generating, by a processing device, a set of clusters to represent patterns in the set of emails by performing a MinHash function on the first portion of the set of emails to generate a first set of numerical signatures, performing the MinHash function on the second portion of the set of emails to generate a second set of numerical signatures, selecting a first minimum numerical signature from the first set of numerical signatures, and selecting a second minimum numerical signature from the first set of numerical signatures; grouping the first minimum numerical signature into a first group based on a first similarity and the second minimum numerical signature into a second group based on a second similarity; generating training data by extracting a plurality of keywords from labeled data associated with a different set of emails and vectorizing the plurality of keywords by determining a frequency in which the plurality of keywords appear in the different set of emails; providing the plurality of emails to a classifier model trained with the training data to generate a plurality of maliciousness scores indicating likelihoods of the plurality of emails including malicious content; and updating the plurality of associations between the plurality of emails and the plurality of MD vectors of the plurality of emails to further be associated with the plurality of maliciousness scores. 2 . The method of claim 1 , wherein identifying, based on the one or more keywords, the set of MD vectors of the plurality of MD vectors is responsive to: determining an expiration of a counter. 3 . The method of claim 1 , further comprising: providing the plurality of emails to a natural language processing (NLP) model trained to vectorize one or more portions of each email of the plurality of emails into a corresponding MD vector of the plurality of MD vectors; and generating, using the NLP model, the plurality of MD vectors. 4 . The method of claim 3 , further comprising: training the NLP model to vectorize at least one of a subject of the email or a body of the email into the MD vector. 5 . The method of claim 1 , wherein identifying, based on the one or more keywords, the set of MD vectors of the plurality of MD vectors further comprises: performing an approximate near-neighbor search of the set of MD vectors of the plurality of MD vectors for semantic matches between content of the plurality of MD vectors and the one or more keywords. 6 . The method of claim 1 , further comprising: causing the set of clusters to appear on a display. 7 . A system comprising: a memory; and a processing device, operatively coupled to the memory, to: receive a request for one or more emails associated with one or more keywords; maintain a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails, each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email; identify, based on the one or more keywords, a set of MD vectors of the plurality of MD vectors; select, based on the plurality of associations, a set of emails associated with the set of MD vectors; extract, from the set of emails, a first portion of the set of emails and a second portion of the set of emails; generate a set of clusters to represent patterns in the set of emails by performing a MinHash function on the first portion of the set of emails to generate a first set of numerical signatures and the second portion of the set of emails to generate a second set of numerical signatures, and selecting a first minimum numerical signature from the first set of numerical signatures and a second minimum numerical signature from the first set of numerical signatures; group the first minimum numerical signature into a first group based on a first similarity and the second minimum numerical signature into a second group based on a second similarity; generate training data by extracting a plurality of keywords from labeled data associated with a different set of emails and vectorizing the plurality of keywords by determining a frequency in which the plurality of keywords appear in the different set of emails; provide the plurality of emails to a classifier model trained with the training data to generate a plurality of maliciousness scores indicating likelihoods of the plurality of emails including malicious content; and update the plurality of associations between the plurality of emails and the plurality of MD vectors of the plurality of emails to further be associated with the plurality of maliciousness scores. 8 . The system of claim 7 , wherein the processing device to: determine an expiration of a counter. 9 . The system of claim 7 , wherein the processing device to: provide the plurality of emails to a natural language processing (NLP) model trained to vectorize one or more portions of the email of the plurality of emails into a corresponding MD vector of the plurality of MD vectors; and generate, using the NLP model, the plurality of MD vectors. 10 . The system of claim 9 , wherein the processing device to: train the NLP model to vectorize at least one of a subject of the email or a body of the email into the MD vector. 11 . The system of claim 7 , wherein to identify, based on the one or more keywords, the set of MD vectors of the plurality of MD vectors, the processing device to: perform an approximate near-neighbor search of the set of MD vectors of the plurality of MD vectors for semantic matches between content of the plurality of MD vectors and the one or more keywords. 12 . A non-transitory computer-readable medium storing instructions that, when execute by a processing device, cause the processing device to: receive a request for one or more emails associated with one or more keywords; maintain a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails, each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email; identify, based on the one or more keywords, a set of MD vectors of the plurality of MD vectors; select, based on the plurality of associations, a set of emails associated with the set of MD vectors; extract, from the set of emails, a first portion of the set of emails and a second portion of the set of emails; generate, by the processing device, based on the set of emails or the set of MD vectors, a set of clusters to represent patterns in the set of emails by performing a MinHash function on the first portion of the set of emails to generate a first set of numerical signatures and the second portion of the set of emails to generate a second

Assignees

Inventors

Classifications

  • Natural language query formulation · CPC title

  • Selection or weighting of terms for indexing · CPC title

  • Hash tables · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12488037B2 cover?
A system and method of detecting malicious activity in emails using pattern recognition. The method includes maintaining a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails. Each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that cor…
Who is the assignee on this patent?
Crowdstrike Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).