What technology area does this patent fall under?

Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Dec 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for machine learning-based digital content clustering, digital content threat detection, and digital content threat remediation in machine learning-based digital threat mitigation platform

US11528290B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11528290-B2
Application number	US-202217714986-A
Country	US
Kind code	B2
Filing date	Apr 6, 2022
Priority date	Mar 4, 2020
Publication date	Dec 13, 2022
Grant date	Dec 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A machine learning-based system and method for content clustering and content threat assessment includes generating embedding values for each piece of content of corpora of content data; implementing unsupervised machine learning models that: receive model input comprising the embeddings values of each piece of content of the corpora of content data; and predict distinct clusters of content data based on the embeddings values of the corpora of content data; assessing the distinct clusters of content data; associating metadata with each piece of content defining a member in each of the distinct clusters of content data based on the assessment, wherein the associating the metadata includes attributing to each piece of content within the clusters of content data a classification label of one of digital abuse/digital fraud and not digital abuse/digital fraud; and identifying members or content clusters having digital fraud/digital abuse based on querying the distinct clusters of content data.

First claim

Opening claim text (preview).

What is claimed: 1. A machine learning-based method for detecting fraudulent spam and identifying a fraud threat mitigation response, the method comprising: creating a spam corpus that includes spam data samples, wherein each spam data sample of the spam corpus comprises fraudulent text; creating a spam embeddings corpus of a plurality of fraudulent sentence embeddings based on converting the spam corpus, wherein creating the spam embeddings corpus includes: (a-i) implementing a machine learning-based transformer model that converts each of the spam data samples to a distinct spam numerical vector representation; (a-ii) defining a spam vector corpus for a plurality of distinct spam numerical vector representations based on the conversion of the spam data samples; defining, using a clustering algorithm, a plurality of distinct spam clusters based on the spam vector corpus, wherein defining the plurality of distinct spam clusters includes: (b-i) setting a clustering similarity parameter that informs a clustering density of an unsupervised machine learning-based clustering model, wherein the clustering similarity parameter governs a size of the plurality of distinct spam clusters; (b-ii) implementing the unsupervised machine learning-based clustering model that creates the plurality of distinct spam clusters by grouping distinct subsets of the plurality of distinct spam numerical vector representations of the spam vector corpus, wherein each of the plurality of distinct spam clusters includes a distinct centroid; creating a searchable index of the plurality of distinct spam clusters; implementing a web-based spam threat interface that: (c-i) receives input of a target spam data item comprising one or more potentially fraudulent sentences, wherein the target spam data item is converted to a target spam numerical vector value using the machine learning-based transformer model; (c-ii) performs a search of the searchable index of the plurality of distinct spam clusters based on the target spam numerical vector, wherein performing the search of the searchable index includes pairing the target spam numerical vector value to one or more distinct spam clusters of the plurality of distinct spam clusters that include one or more distinct spam numerical vector representations associated with character substitutions to the target spam data item; and identifying a fraud threat mitigation response including blocking the target spam data item based on results of the search of the searchable index. 2. The method according to claim 1 , wherein performing the search of the searchable index further includes pairing the target spam numerical vector value to one or more distinct spam clusters of the plurality of distinct spam clusters that includes one or more distinct spam numerical vector representations equivalent to the target spam numerical vector value. 3. The method according to claim 1 , further comprising displaying, on the web-based spam threat interface, a spam cluster-to-user network map for one of the one or more distinct spam clusters, wherein the spam cluster-to-user network map includes: (a) a textual summary of the one of the one or more distinct spam clusters; (b) a plurality of representations of user accounts associated with the one of the one or more distinct spam clusters; and (c) a plurality of graphical edges, wherein each graphical edge of the plurality of graphical edges extends in a direction from a distinct representation of a user account of the plurality of representations of user accounts to the textual summary. 4. The method according to claim 3 , further comprising mitigating, via executing one or more digital threat mitigation actions, a plurality of user accounts associated with the plurality of representations of user accounts that prevents the plurality of user accounts from performing at least one type of digital event. 5. The method according to claim 1 , wherein the fraud threat mitigation response further includes implementing an automated decisioning workflow that automatically blocks future content data from publishing on an online resource if the future content data is identical or semantically similar to the target spam data item. 6. The method according to claim 1 , wherein the one or more potentially fraudulent sentences of the target spam data item relates to text data, communication data, or media data that is posted to a web or Internet-accessible medium, platform, service, system, or channel. 7. The method according to claim 1 , wherein defining the plurality of distinct spam clusters further includes (b-iii) attributing to each of the plurality of distinct spam clusters a classification label indicating digital abuse. 8. The method according to claim 7 , further includes in response to performing the search of the searchable index: identifying one or more distinct spam clusters of the plurality of distinct spam clusters comprising at least one distinct spam numerical vector representation that is equivalent or a near-equivalent to the target spam numerical vector value of the target spam data item; and displaying, on the web-based spam threat interface, the one or more distinct spam clusters. 9. The method according to claim 1 , wherein the one or more potentially fraudulent sentences of the target spam data item relates to text content observed from an online post. 10. The method according to claim 1 , wherein each distinct spam cluster of the plurality of distinct spam clusters generated by the unsupervised machine learning-based clustering model corresponds to a distinct one of a plurality of distinct type of spam content. 11. The method according to claim 1 , wherein each distinct spam numerical vector representation, computed by the machine learning-based transformer model, is a representation of one of the spam data samples in a numerical form. 12. The method according to claim 1 , wherein setting the clustering similarity parameter includes setting the clustering similarity parameter to a state that permits the unsupervised machine learning-based clustering model, when implemented, to create distinct spam clusters comprising identical spam numerical vector representations and non-identical spam numerical vector representations substantially similar to the identical spam numerical vector representations. 13. A machine learning-based method for detecting fraudulent spam and identifying a fraud threat mitigation response, the method comprising: implementing an unsupervised machine learning-based clustering model that predicts a plurality of distinct spam clusters based on a plurality of spam numerical vector representations; creating a searchable index of the plurality of distinct spam clusters, wherein the searchable index includes a searchable representation for each distinct spam cluster of the plurality of distinct spam clusters; implementing a web-based spam threat interface that: (i) receives input of a target spam data item comprising one or more potentially fraudulent sentences; (ii) initiates a search of the searchable index of the plurality of distinct spam clusters based on an embedded representation of the target spam data item; identifying a fraud threat mitigation response that includes blocking the target spam data item based on results of the search returning at least one distinct spam cluster of the plurality of distinct spam clusters corresponding to digital fraud or digital abuse; displaying, on the web-based spam threat interface, a spam cluster-to-user network map for the at least one distinct spam cluster corresponding to digital fraud or digital abus

Assignees

Sift Science Inc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06F16/285
Clustering or classification · CPC title
G06F16/217
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
G06N20/00
Machine learning · CPC title
H04L63/1425Primary
Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

View patent family 77556302

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11528290B2 cover?: A machine learning-based system and method for content clustering and content threat assessment includes generating embedding values for each piece of content of corpora of content data; implementing unsupervised machine learning models that: receive model input comprising the embeddings values of each piece of content of the corpora of content data; and predict distinct clusters of content dat…
Who is the assignee on this patent?: Sift Science Inc
What technology area does this patent fall under?: Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Dec 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods and arrangements to distribute a fraud detection model

Threat mitigation system and method

Data clean-up method for improving predictive model training

Systems, methods, and apparatuses for implementing machine learning models for smart contracts using distributed ledger technologies in a cloud based computing environment

Systems and methods for detection of infected websites

Frequently asked questions