Graph-based feature engineering for machine learning models

US2024045907A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024045907-A1
Application numberUS-202217882350-A
CountryUS
Kind codeA1
Filing dateAug 5, 2022
Priority dateAug 5, 2022
Publication dateFeb 8, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems are presented for assisting a user to identify and evaluate features for use in a machine learning model configured to perform a task. Based on graph data associated with a graph data structure, a user interface is provided on a device. Based on user inputs received via the user interface, a feature candidate for the machine learning model is determined. The feature candidate is associated with a particular way of traversing the graph data structure to obtain attribute values associated with one or more vertices and/or one or more edges in the graph data structure. Based on the attribute values, a value corresponding to the feature candidate can be calculated. The value can be used to evaluate the effectiveness of the feature candidate in performing the task. The feature candidate can then be incorporated into the machine learning model as one of the input features.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system, comprising: a non-transitory memory; and one or more hardware processors coupled with the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising: providing, on a device, a user interface based on graph data associated with a graph; receiving a user input via the user interface; determining, based on the user input, a feature candidate for a machine learning model configured to perform a task, wherein the feature candidate is associated with a traversal of the graph from a seed vertex and a calculation based on one or more attributes associated with a vertex along the traversal; and configuring the machine learning model to use the feature candidate as an input feature to perform the task. 2 . The system of claim 1 , wherein the operations further comprise: generating computer programming code that implements the feature candidate. 3 . The system of claim 2 , wherein the operations further comprise: traversing, based on executing the computer programming code, the graph from a particular vertex within the graph; obtaining, based on the traversing, one or more attribute values; and calculating, for a particular user account corresponding to the particular vertex, a value corresponding to the feature candidate based on the one or more attribute values. 4 . The system of claim 2 , wherein the operations further comprise incorporating the computer programming code into the machine learning model. 5 . The system of claim 1 , wherein the operations further comprise: performing a plurality of simulations on the feature candidate based on using different vertices in the graph as the seed vertex; and determining a correlation between the feature candidate and the task based on simulation results from the performing. 6 . The system of claim 5 , wherein the operations further comprise: determining that the correlation exceeds a threshold, wherein the configuring the machine learning model to use the feature candidate as an input feature is in response to the determining that the correlation exceeds the threshold. 7 . The system of claim 1 , wherein the user input specifies a number of hops to traverse from the seed vertex for the feature candidate. 8 . A method, comprising: receiving, by one or more hardware processors and via a user interface of a device, a user interaction with a graphical element representing at least a portion of a graph associated with a service provider, wherein the graph comprises a plurality of vertices and a plurality of edges; determining, by the one or more hardware processors and based on the user interaction with the graphical element, a feature candidate for a machine learning model configured to perform a task, wherein the feature candidate is associated with a traversal of the graph from a seed vertex and a calculation based on one or more attributes associated with a vertex along the traversal; and configuring, by the one or more hardware processors, the machine learning model to use the feature candidate as an input feature to perform the task. 9 . The method of claim 8 , wherein the calculation is based on at least one of a sum, an average, a maximum, a minimum, or a count. 10 . The method of claim 8 , wherein the user interaction specifies a type of edge to traverse from the seed vertex. 11 . The method of claim 8 , further comprising: assigning, from the plurality of vertices of the graph, a particular vertex as the seed vertex; calculating, based on traversing the graph from the particular vertex, a value corresponding to the feature candidate; and providing the value to the machine learning model. 12 . The method of claim 11 , further comprising: receiving a request to perform the task based on a particular user account with the service provider; and determining that the particular user account is represented by the particular vertex, wherein the assigning the particular vertex as the seed vertex is in response to the determining that the particular user account is represented by the particular vertex. 13 . The method of claim 8 , further comprising: generating, for the feature candidate, computer programming code that, when executed, computes values corresponding to the feature candidate for different user accounts with the service provider. 14 . The method of claim 13 , wherein the computer programming code is associated with a graph query language. 15 . A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: accessing, from a data storage, graph data associated with a graph, wherein the graph represents relationships among a plurality of user accounts with a service provider; providing, on a device, a user interface based on the graph data; receiving a user input via the user interface; determining, based on the user input, a feature candidate for a machine learning model configured to perform a task, wherein the feature candidate is associated with a traversal of the graph from a seed vertex and a calculation based on one or more attributes associated with a vertex along the traversal; and configuring the machine learning model to use the feature candidate as an input feature to perform the task. 16 . The non-transitory machine-readable medium of claim 15 , wherein the operations further comprise: generating, for the feature candidate, computer programming code for computing values corresponding to the feature candidate for different user accounts with the service provider. 17 . The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise: traversing, based on executing the computer programming code, the graph from a particular vertex within the graph to obtain one or more attribute values corresponding to the one or more attributes; and calculating, for a particular user account corresponding to the particular vertex, a value corresponding to the feature candidate based on the one or more attribute values. 18 . The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise incorporating the computer programming code into the machine learning model. 19 . The non-transitory machine-readable medium of claim 1 , wherein the operations further comprise: performing a plurality of simulations on the feature candidate based on using different vertices in the graph as the seed vertex; and determining whether a correlation exists between the feature candidate and the task based on simulation results from performing the plurality of simulations. 20 . The non-transitory machine-readable medium of claim 19 , wherein the operations further comprise: determining that the correlation exists between the feature candidate and the task based on the simulation results, wherein the configuring the machine learning model to use the feature candidate as an input feature is based on the correlation.

Assignees

Inventors

Classifications

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Query predicate definition using graphical user interfaces, including menus and forms (G06F16/2423 takes precedence) · CPC title

  • Ensemble learning · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024045907A1 cover?
Methods and systems are presented for assisting a user to identify and evaluate features for use in a machine learning model configured to perform a task. Based on graph data associated with a graph data structure, a user interface is provided on a device. Based on user inputs received via the user interface, a feature candidate for the machine learning model is determined. The feature candidat…
Who is the assignee on this patent?
Paypal Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).