What technology area does this patent fall under?

Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue May 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for machine learning-based digital content clustering, digital content threat detection, and digital content threat remediation in machine learning task-oriented digital threat mitigation platform

US11330009B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11330009-B2
Application number	US-202117180592-A
Country	US
Kind code	B2
Filing date	Feb 19, 2021
Priority date	Mar 4, 2020
Publication date	May 10, 2022
Grant date	May 10, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A machine learning-based system and method for content clustering and content threat assessment includes generating embedding values for each piece of content of corpora of content data; implementing unsupervised machine learning models that: receive model input comprising the embeddings values of each piece of content of the corpora of content data; and predict distinct clusters of content data based on the embeddings values of the corpora of content data; assessing the distinct clusters of content data; associating metadata with each piece of content defining a member in each of the distinct clusters of content data based on the assessment, wherein the associating the metadata includes attributing to each piece of content within the clusters of content data a classification label of one of digital abuse/digital fraud and not digital abuse/digital fraud; and identifying members or content clusters having digital fraud/digital abuse based on querying the distinct clusters of content data.

First claim

Opening claim text (preview).

What is claimed: 1. A machine learning-based method for content clustering and content threat assessment in a machine learning task-oriented threat mitigation platform, the method comprising: generating embedding values for each piece of content of one or more corpora of content data; implementing one or more unsupervised machine learning models that: (i) receive model input comprising the embeddings values of each piece of content of the one or more corpora of content data; and (ii) predict a plurality of distinct clusters of content data based on the embeddings values of the one or more corpora of content data; assessing the plurality of distinct clusters of content data; associating metadata with each of the plurality of distinct clusters of content data based on the assessment, wherein the associating the metadata includes attributing to each piece of content within the plurality of distinct clusters of content data a classification label of one of (a) an adverse label indicating digital abuse or digital fraud and (b) not digital abuse or not digital fraud; at a machine-learning threat mitigation service: receiving from a subscriber of a threat mitigation service, via an application programming interface (API), a text content query comprising a target piece of online text data associated with one or more online services of the subscriber; querying the plurality of distinct clusters of content data that have the adverse label with an embedded query representation of the text content query based on a similarity threshold, wherein the similarity threshold is set to identify clusters of content data including both: (1) embedded representations identical to the text content query, and (2) embedded representations that include character substitutions of the text content query; and identifying the target piece of online text data with the adverse label indicating digital abuse or digital fraud if one or more of the plurality of distinct clusters of content data that have the adverse label is returned in response to the text content query; and displaying, on a user interface, a content-to-user network map for at least one of the one or more of the plurality of distinct clusters of content data that have the adverse label if the one or more of the plurality of distinct clusters of content data that have the adverse label is returned in response to the text content query, wherein the content-to-user network map includes: (a) a textual summary of the at least one of the one or more of the plurality of distinct clusters of content data that have the adverse label; (b) a plurality of representations of user accounts associated with pieces of content within the at least one of the one or more of the plurality of distinct clusters of content data that have the adverse label; and (c) a plurality of graphical edges, wherein each graphical edge of the plurality of graphical edges visually connects a distinct representation of a user account of the plurality of representations of user accounts to the textual summary of the at least one of the one or more of the plurality of distinct clusters of content data that have the adverse label; and mitigating, via a bulk mitigation action, a network of user accounts associated with the plurality of representations of user accounts that prevents the network of user accounts from publishing future content on one or more online resources of the subscriber. 2. The method according to claim 1 , wherein: the application programming interface (API) is searchably connected to each of the plurality of distinct clusters of content data. 3. The method according to claim 1 , wherein: the text content query comprises text content observed from an online post or an electronic communication, the text content is converted to the embedded query representation, and the identifying includes identifying one or more of the plurality of distinct clusters of content data that include pieces of content having the embedded query representation. 4. The method according to claim 1 , further comprising: a querying interface that includes a tuning interface object that, when adjusted or acted upon by user input, tunes one or more clustering similarity thresholds to increase or decrease a number of members within a target cluster of the plurality of distinct clusters of content data. 5. The method according to claim 4 , further comprising: querying, via the querying interface, the plurality of distinct clusters of content data based on the text content query; returning one or more of the plurality of distinct clusters of content data based on the querying; and increasing or decreasing a number of members within the one or more of the plurality of distinct clusters of content data based on an input to the tuning interface object. 6. The method according to claim 1 , further comprising: creating a cluster mapping that associates a search grain with at least one cluster of the plurality of distinct clusters of content data. 7. The method according to claim 6 , wherein: the search grain comprises the target piece of online text data, and the method further comprising: using the target piece of online text data to query the plurality of distinct clusters of content data; and returning, based on the target piece of online text data, one or more clusters of a plurality of distinct clusters of identifiers of a plurality of distinct clusters of content data. 8. The method according to claim 1 , further comprising: deriving, based on the plurality of distinct clusters of content data, a plurality of distinct clusters of identifiers of a plurality of online users that post online content. 9. The method according to claim 8 , further comprising: creating a cluster mapping that associates a search grain with at least one cluster of the plurality of distinct clusters of identifiers of the plurality of online users that post online content, wherein the search grain comprises an online user identifier of a user attempting to post online content or posting online content; using the online user identifier to query the plurality of distinct clusters of identifiers of online users; and returning, based on the online user identifier, one or more clusters of the plurality of distinct clusters of identifiers of the plurality of online users. 10. The method according to claim 6 , wherein the search grain comprises an identifier of a subscriber to the machine-learning threat mitigation service, the method further comprising: using the identifier of the subscriber to query the plurality of distinct clusters of identifiers of the plurality of online users; and returning, based on the identifier of the subscriber, one or more cluster members from one or more of the plurality of distinct clusters of identifiers of the plurality of online users. 11. The method according to claim 1 , wherein the content data relates to text data, communication data, or media data that is posted to a web or Internet-accessible medium, platform, service, system, or channel. 12. The method according to claim 1 , wherein associating the metadata includes: associating the classification label, in bulk, to a target cluster of the plurality of distinct clusters of content data, wherein the associating the classification label in bulk causes an association of a single classification label to all members of the target cluster. 13. The method according to claim 1 , wherein: the identifying includes identifying the one or more of the plurality of distinct clusters of content data based on a query comprising a metadata tag, the metadata tag identifying a classification of the one or more content clusters;

Assignees

Sift Science Inc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/094
Adversarial learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06N3/0475
Generative networks · CPC title

Patent family

Related publications grouped by family.

View patent family 77556302

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11330009B2 cover?: A machine learning-based system and method for content clustering and content threat assessment includes generating embedding values for each piece of content of corpora of content data; implementing unsupervised machine learning models that: receive model input comprising the embeddings values of each piece of content of the corpora of content data; and predict distinct clusters of content dat…
Who is the assignee on this patent?: Sift Science Inc
What technology area does this patent fall under?: Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue May 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods and arrangements to distribute a fraud detection model

Threat mitigation system and method

Data clean-up method for improving predictive model training

Systems and methods for detection of infected websites

Frequently asked questions