Machine Learning and Security Classification of User Accounts

US2020005195A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020005195-A1
Application numberUS-201816026037-A
CountryUS
Kind codeA1
Filing dateJul 2, 2018
Priority dateJul 2, 2018
Publication dateJan 2, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Machine learning techniques are used in combination with graph data structures to perform automated classification of accounts. Graphs may be constructed using a seed node and then expanded outward to second-degree nodes and third-degree nodes that are connected to a seed user account node via direct interaction between the accounts. Characterization information regarding the interaction between accounts can be stored in the graph (e.g., quantity of interactions, types of interactions) as well as other metrics and metadata. A classifier, using random forest or another technique, may be trained using a number of different graphs that can then be used to reach a determination as to whether a user account falls into one particular category or another. These techniques can identify accounts that may be violating terms of service, committing a security violation, and/or performing illegal actions in a way that is not ascertainable from human analysis.

First claim

Opening claim text (preview).

What is claimed is: 1 . A machine learning system, comprising: a processor; and a non-transitory computer-readable medium having stored thereon instructions that are executable by the processor to cause the machine learning system to perform operations comprising: creating a seed node based on a seed account that satisfies one or more selection criteria; creating a graph data structure for the seed node that includes information on other nodes connected to the seed node, including: determining one or more second-degree accounts with which the seed account has transacted; adding the one or more second-degree accounts as second-degree nodes connected to the seed node in the graph data structure; creating a first group of edges in the graph data structure indicating links between the seed node and each of the one or more second degree nodes; and for each of the one or more second-degree nodes: determining one or more third-degree accounts with which a second-degree account for that second-degree node has transacted; adding the one or more third-degree accounts as third-degree nodes to the graph data structure; and creating a second respective group of edges in the graph data structure indicating links between that second-degree node and the one or more third-degree nodes; for each of the edges in the graph data structure, calculating and storing in the graph data structure one or more attribute values based on one or more transactions occurring between the nodes connected to that edge; and providing the graph data structure as input to an machine learning model. 2 . The machine learning system of claim 1 , wherein providing the graph data structure as input to the machine learning model comprises providing a label value for the seed node to the machine learning model, wherein the label value indicates whether the seed node is believed to correspond to a user account that has engaged in collusion. 3 . The machine learning system of claim 1 , wherein the operations further comprise: providing a plurality of graph data structures to the machine learning model; and the machine learning model producing a trained classifier, based on the plurality of graph data structures, that is configured to accept an unclassified graph data structure and predict a classification value for an unclassified seed node for the unclassified graph data structure. 4 . The machine learning system of claim 3 , wherein the classification value is a categorization of an account being a colluding account or a non-colluding account. 5 . The machine learning system of claim 3 , wherein the classification value has a corresponding confidence value. 6 . The machine learning system of claim 1 , wherein the one or more attribute values for at least one of the edges in the graph include a dispute claim type for one or more transactions. 7 . The machine learning system of claim 1 , wherein the operations further comprise calculating and storing graph-level attributes for the graph data structure based on attribute values for the nodes in the graph. 8 . The machine learning system of claim 7 , wherein the graph-level attributes include a proportion of nodes in the graph corresponding to accounts believed to have engaged in fraud. 9 . A method for machine-learning based account classification, comprising: accessing, by a computer system, a graph data structure having a seed node that corresponds to an unclassified seed account; providing, by the computer system, the graph data structure to a trained machine learning (ML) classifier, wherein the ML classifier was trained using a plurality of graph data structures each built using operations comprising: determining one or more second-degree accounts with which a seed account for the graph data structure has transacted; adding the one or more second-degree accounts as second-degree nodes connected to the seed node in the graph data structure; creating a first group of edges in the graph data structure indicating links between the seed node and each of the one or more second degree nodes; and for each of the one or more second-degree nodes: determining one or more third-degree accounts with which a second-degree account for that second-degree node has transacted; adding the one or more third-degree accounts as third-degree nodes to the graph data structure; and creating a second respective group of edges in the graph data structure indicating links between that second-degree node and the one or more third-degree nodes; and receiving, by the computer system from the trained ML classifier, a classification of the seed account. 10 . The method of claim 9 , further comprising: determining, by the computer system, whether to take a corrective action against the seed account based on the classification. 11 . The method of claim 10 , wherein the classification indicates the seed account is believed to have engaged in collusion, further comprising taking corrective action including cause the suspension of transaction privileges for the seed account. 12 . The method of claim 9 , wherein the operations to build each of the plurality of graph data structures further comprise: for each of the edges in the graph data structure, calculating and storing in the graph data structure one or more attribute values based on one or more transactions occurring between the nodes connected to that edge. 13 . The method of claim 9 , wherein the ML classifier comprises a random forest based classifier. 14 . The method of claim 9 , wherein for each of the one or more second-degree nodes, determining one or more third-degree accounts with which a second-degree account for that second-degree node has transacted includes determining that second-degree node has not transacted with any third-degree accounts within a particular timeframe and not adding any third-degree accounts to the graph data structure for that second-degree node. 15 . The method of claim 9 , wherein the classification of the seed account indicates that the seed account has violated an authorized use policy (AUP) applicable to the seed account. 16 . The method of claim 9 , wherein for one or more nodes in the graph, those nodes are already labeled as belonging to one of a plurality of classification categories that include the classification of the seed account. 17 . A non-transitory computer-readable medium having stored thereon instructions that are executable by a computer system to cause the computer system to perform operations comprising: accessing, by a computer system, a graph data structure having a seed node that corresponds to an unclassified seed account; providing, by the computer system, the graph data structure to a trained machine learning (ML) classifier, wherein the ML classifier was trained using a plurality of graph data structures each built using operations comprising: determining one or more second-degree accounts with which a seed account for the graph data structure has transacted; adding the one or more second-degree accounts as second-degree nodes connected to the seed node in the graph data structure; creating a first group of edges in the graph data structure indicating links between the seed node and each of the one or more second degree nodes; and for each of the one or more second-degree nodes: determining one or more third-degree accounts with which a second-degree account for that second-degree node has transacted; adding the one or more third-degree accounts as third-degree nodes to the graph data structure; and creating a

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Combinations of networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06F21/552Primary

    involving long-term monitoring or reporting · CPC title

  • Tools and structures for managing or administering access control systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020005195A1 cover?
Machine learning techniques are used in combination with graph data structures to perform automated classification of accounts. Graphs may be constructed using a seed node and then expanded outward to second-degree nodes and third-degree nodes that are connected to a seed user account node via direct interaction between the accounts. Characterization information regarding the interaction betwee…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/552. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 02 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).