Methods and arrangements to distribute a fraud detection model
US-2020065816-A1 · Feb 27, 2020 · US
US11528290B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11528290-B2 |
| Application number | US-202217714986-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 6, 2022 |
| Priority date | Mar 4, 2020 |
| Publication date | Dec 13, 2022 |
| Grant date | Dec 13, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A machine learning-based system and method for content clustering and content threat assessment includes generating embedding values for each piece of content of corpora of content data; implementing unsupervised machine learning models that: receive model input comprising the embeddings values of each piece of content of the corpora of content data; and predict distinct clusters of content data based on the embeddings values of the corpora of content data; assessing the distinct clusters of content data; associating metadata with each piece of content defining a member in each of the distinct clusters of content data based on the assessment, wherein the associating the metadata includes attributing to each piece of content within the clusters of content data a classification label of one of digital abuse/digital fraud and not digital abuse/digital fraud; and identifying members or content clusters having digital fraud/digital abuse based on querying the distinct clusters of content data.
Opening claim text (preview).
What is claimed: 1. A machine learning-based method for detecting fraudulent spam and identifying a fraud threat mitigation response, the method comprising: creating a spam corpus that includes spam data samples, wherein each spam data sample of the spam corpus comprises fraudulent text; creating a spam embeddings corpus of a plurality of fraudulent sentence embeddings based on converting the spam corpus, wherein creating the spam embeddings corpus includes: (a-i) implementing a machine learning-based transformer model that converts each of the spam data samples to a distinct spam numerical vector representation; (a-ii) defining a spam vector corpus for a plurality of distinct spam numerical vector representations based on the conversion of the spam data samples; defining, using a clustering algorithm, a plurality of distinct spam clusters based on the spam vector corpus, wherein defining the plurality of distinct spam clusters includes: (b-i) setting a clustering similarity parameter that informs a clustering density of an unsupervised machine learning-based clustering model, wherein the clustering similarity parameter governs a size of the plurality of distinct spam clusters; (b-ii) implementing the unsupervised machine learning-based clustering model that creates the plurality of distinct spam clusters by grouping distinct subsets of the plurality of distinct spam numerical vector representations of the spam vector corpus, wherein each of the plurality of distinct spam clusters includes a distinct centroid; creating a searchable index of the plurality of distinct spam clusters; implementing a web-based spam threat interface that: (c-i) receives input of a target spam data item comprising one or more potentially fraudulent sentences, wherein the target spam data item is converted to a target spam numerical vector value using the machine learning-based transformer model; (c-ii) performs a search of the searchable index of the plurality of distinct spam clusters based on the target spam numerical vector, wherein performing the search of the searchable index includes pairing the target spam numerical vector value to one or more distinct spam clusters of the plurality of distinct spam clusters that include one or more distinct spam numerical vector representations associated with character substitutions to the target spam data item; and identifying a fraud threat mitigation response including blocking the target spam data item based on results of the search of the searchable index. 2. The method according to claim 1 , wherein performing the search of the searchable index further includes pairing the target spam numerical vector value to one or more distinct spam clusters of the plurality of distinct spam clusters that includes one or more distinct spam numerical vector representations equivalent to the target spam numerical vector value. 3. The method according to claim 1 , further comprising displaying, on the web-based spam threat interface, a spam cluster-to-user network map for one of the one or more distinct spam clusters, wherein the spam cluster-to-user network map includes: (a) a textual summary of the one of the one or more distinct spam clusters; (b) a plurality of representations of user accounts associated with the one of the one or more distinct spam clusters; and (c) a plurality of graphical edges, wherein each graphical edge of the plurality of graphical edges extends in a direction from a distinct representation of a user account of the plurality of representations of user accounts to the textual summary. 4. The method according to claim 3 , further comprising mitigating, via executing one or more digital threat mitigation actions, a plurality of user accounts associated with the plurality of representations of user accounts that prevents the plurality of user accounts from performing at least one type of digital event. 5. The method according to claim 1 , wherein the fraud threat mitigation response further includes implementing an automated decisioning workflow that automatically blocks future content data from publishing on an online resource if the future content data is identical or semantically similar to the target spam data item. 6. The method according to claim 1 , wherein the one or more potentially fraudulent sentences of the target spam data item relates to text data, communication data, or media data that is posted to a web or Internet-accessible medium, platform, service, system, or channel. 7. The method according to claim 1 , wherein defining the plurality of distinct spam clusters further includes (b-iii) attributing to each of the plurality of distinct spam clusters a classification label indicating digital abuse. 8. The method according to claim 7 , further includes in response to performing the search of the searchable index: identifying one or more distinct spam clusters of the plurality of distinct spam clusters comprising at least one distinct spam numerical vector representation that is equivalent or a near-equivalent to the target spam numerical vector value of the target spam data item; and displaying, on the web-based spam threat interface, the one or more distinct spam clusters. 9. The method according to claim 1 , wherein the one or more potentially fraudulent sentences of the target spam data item relates to text content observed from an online post. 10. The method according to claim 1 , wherein each distinct spam cluster of the plurality of distinct spam clusters generated by the unsupervised machine learning-based clustering model corresponds to a distinct one of a plurality of distinct type of spam content. 11. The method according to claim 1 , wherein each distinct spam numerical vector representation, computed by the machine learning-based transformer model, is a representation of one of the spam data samples in a numerical form. 12. The method according to claim 1 , wherein setting the clustering similarity parameter includes setting the clustering similarity parameter to a state that permits the unsupervised machine learning-based clustering model, when implemented, to create distinct spam clusters comprising identical spam numerical vector representations and non-identical spam numerical vector representations substantially similar to the identical spam numerical vector representations. 13. A machine learning-based method for detecting fraudulent spam and identifying a fraud threat mitigation response, the method comprising: implementing an unsupervised machine learning-based clustering model that predicts a plurality of distinct spam clusters based on a plurality of spam numerical vector representations; creating a searchable index of the plurality of distinct spam clusters, wherein the searchable index includes a searchable representation for each distinct spam cluster of the plurality of distinct spam clusters; implementing a web-based spam threat interface that: (i) receives input of a target spam data item comprising one or more potentially fraudulent sentences; (ii) initiates a search of the searchable index of the plurality of distinct spam clusters based on an embedded representation of the target spam data item; identifying a fraud threat mitigation response that includes blocking the target spam data item based on results of the search returning at least one distinct spam cluster of the plurality of distinct spam clusters corresponding to digital fraud or digital abuse; displaying, on the web-based spam threat interface, a spam cluster-to-user network map for the at least one distinct spam cluster corresponding to digital fraud or digital abus
Combinations of networks · CPC title
Clustering or classification · CPC title
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Machine learning · CPC title
Traffic logging, e.g. anomaly detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.