Updating window representations of sliding window of text using rolling scheme
US-2024273125-A1 · Aug 15, 2024 · US
US12488037B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12488037-B2 |
| Application number | US-202318326919-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 31, 2023 |
| Priority date | May 31, 2023 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method of detecting malicious activity in emails using pattern recognition. The method includes maintaining a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails. Each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email. The method includes identifying, based on one or more keywords, a set of MD vectors of the plurality of MD vectors. The method includes selecting, based on the plurality of associations, a set of emails associated with the set of MD vectors. The method includes generating, by a processing device, based on the set of emails or the set of MD vectors, a set of clusters to represent patterns in the set of emails.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving a request for one or more emails associated with one or more keywords; maintaining a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails, each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email; identifying, based on the one or more keywords, a set of MD vectors of the plurality of MD vectors; selecting, based on the plurality of associations, a set of emails associated with the set of MD vectors; extracting, from the set of emails, a first portion of the set of emails and a second portion of the set of emails; generating, by a processing device, a set of clusters to represent patterns in the set of emails by performing a MinHash function on the first portion of the set of emails to generate a first set of numerical signatures, performing the MinHash function on the second portion of the set of emails to generate a second set of numerical signatures, selecting a first minimum numerical signature from the first set of numerical signatures, and selecting a second minimum numerical signature from the first set of numerical signatures; grouping the first minimum numerical signature into a first group based on a first similarity and the second minimum numerical signature into a second group based on a second similarity; generating training data by extracting a plurality of keywords from labeled data associated with a different set of emails and vectorizing the plurality of keywords by determining a frequency in which the plurality of keywords appear in the different set of emails; providing the plurality of emails to a classifier model trained with the training data to generate a plurality of maliciousness scores indicating likelihoods of the plurality of emails including malicious content; and updating the plurality of associations between the plurality of emails and the plurality of MD vectors of the plurality of emails to further be associated with the plurality of maliciousness scores. 2 . The method of claim 1 , wherein identifying, based on the one or more keywords, the set of MD vectors of the plurality of MD vectors is responsive to: determining an expiration of a counter. 3 . The method of claim 1 , further comprising: providing the plurality of emails to a natural language processing (NLP) model trained to vectorize one or more portions of each email of the plurality of emails into a corresponding MD vector of the plurality of MD vectors; and generating, using the NLP model, the plurality of MD vectors. 4 . The method of claim 3 , further comprising: training the NLP model to vectorize at least one of a subject of the email or a body of the email into the MD vector. 5 . The method of claim 1 , wherein identifying, based on the one or more keywords, the set of MD vectors of the plurality of MD vectors further comprises: performing an approximate near-neighbor search of the set of MD vectors of the plurality of MD vectors for semantic matches between content of the plurality of MD vectors and the one or more keywords. 6 . The method of claim 1 , further comprising: causing the set of clusters to appear on a display. 7 . A system comprising: a memory; and a processing device, operatively coupled to the memory, to: receive a request for one or more emails associated with one or more keywords; maintain a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails, each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email; identify, based on the one or more keywords, a set of MD vectors of the plurality of MD vectors; select, based on the plurality of associations, a set of emails associated with the set of MD vectors; extract, from the set of emails, a first portion of the set of emails and a second portion of the set of emails; generate a set of clusters to represent patterns in the set of emails by performing a MinHash function on the first portion of the set of emails to generate a first set of numerical signatures and the second portion of the set of emails to generate a second set of numerical signatures, and selecting a first minimum numerical signature from the first set of numerical signatures and a second minimum numerical signature from the first set of numerical signatures; group the first minimum numerical signature into a first group based on a first similarity and the second minimum numerical signature into a second group based on a second similarity; generate training data by extracting a plurality of keywords from labeled data associated with a different set of emails and vectorizing the plurality of keywords by determining a frequency in which the plurality of keywords appear in the different set of emails; provide the plurality of emails to a classifier model trained with the training data to generate a plurality of maliciousness scores indicating likelihoods of the plurality of emails including malicious content; and update the plurality of associations between the plurality of emails and the plurality of MD vectors of the plurality of emails to further be associated with the plurality of maliciousness scores. 8 . The system of claim 7 , wherein the processing device to: determine an expiration of a counter. 9 . The system of claim 7 , wherein the processing device to: provide the plurality of emails to a natural language processing (NLP) model trained to vectorize one or more portions of the email of the plurality of emails into a corresponding MD vector of the plurality of MD vectors; and generate, using the NLP model, the plurality of MD vectors. 10 . The system of claim 9 , wherein the processing device to: train the NLP model to vectorize at least one of a subject of the email or a body of the email into the MD vector. 11 . The system of claim 7 , wherein to identify, based on the one or more keywords, the set of MD vectors of the plurality of MD vectors, the processing device to: perform an approximate near-neighbor search of the set of MD vectors of the plurality of MD vectors for semantic matches between content of the plurality of MD vectors and the one or more keywords. 12 . A non-transitory computer-readable medium storing instructions that, when execute by a processing device, cause the processing device to: receive a request for one or more emails associated with one or more keywords; maintain a plurality of associations between a plurality of emails and a plurality of multi-dimensional (MD) vectors of the plurality of emails, each association is between a respective email of the plurality of emails and a respective MD vector of the plurality of MD vectors that corresponds to the respective email; identify, based on the one or more keywords, a set of MD vectors of the plurality of MD vectors; select, based on the plurality of associations, a set of emails associated with the set of MD vectors; extract, from the set of emails, a first portion of the set of emails and a second portion of the set of emails; generate, by the processing device, based on the set of emails or the set of MD vectors, a set of clusters to represent patterns in the set of emails by performing a MinHash function on the first portion of the set of emails to generate a first set of numerical signatures and the second portion of the set of emails to generate a second
Natural language query formulation · CPC title
Selection or weighting of terms for indexing · CPC title
Hash tables · CPC title
Clustering; Classification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.