Performing hyperparameter tuning of models in a massively parallel database system

US2021397975A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021397975-A1
Application numberUS-202017124200-A
CountryUS
Kind codeA1
Filing dateDec 16, 2020
Priority dateJun 17, 2020
Publication dateDec 23, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Hyperparameter tuning for a machine learning model is performed in a massively parallel database system. A computer system comprised of a plurality of compute units executes a relational database management system (RDBMS), wherein the RDBMS manages a relational database comprised of one or more tables storing data. One or more of the compute units perform the hyperparameter tuning for the machine learning model, wherein the hyperparameters are control parameters used in construction of the model, and the tuning of the hyperparameters is implemented as an operation in the RDBMS that accepts training and scoring data for the model, constructs the model using the hyperparameters and the training data, and generates goodness metrics for the model using the scoring data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented apparatus, comprising: (a) a relational database management system (RDBMS) executing in a computer system comprised of a plurality of compute units, wherein the RDBMS manages a relational database comprised of one or more tables storing data; (b) one or more of the compute units tuning hyperparameters for a machine learning model, wherein the hyperparameters are control parameters used in construction of the model, and the tuning of the hyperparameters is implemented as an operation in the RDBMS that accepts training and scoring data for the model, constructs the model using the hyperparameters and the training data, and generates goodness metrics for the model using the scoring data. 2 . The apparatus of claim 1 , wherein a search space for the hyperparameters is defined by one or more combinations of the hyperparameters, and the search space is partitioned across the compute units to parallelize the tuning of the hyperparameters. 3 . The apparatus of claim 2 , wherein: an enumerator enumerates the combinations of hyperparameters in the search space based on an optimization algorithm; and a function evaluator computes an objective function for the combinations of the hyperparameters enumerated in the search space, wherein the objective function computes one or more goodness metrics for the model generated using one or more of the combinations of the hyperparameters enumerated in the search space, to identify an optimal one of the combinations of the hyperparameters. 4 . The apparatus of claim 3 , wherein a plurality of the compute units perform the function evaluator concurrently using the search space that is partitioned across the compute units. 5 . The apparatus of claim 3 , wherein the operation includes training and scoring functions used for computation of the objective function. 6 . The apparatus of claim 5 , wherein the training data is used by the training function to train the model generated using the hyperparameters; the scoring data is used by the scoring function to score the model trained by the training function; and the goodness metrics are used to evaluate the model scored by the scoring function. 7 . The apparatus of claim 3 , wherein the operation includes one or more optimization algorithms for the enumerator, and the enumerator performs a selective enumeration of the combinations of the hyperparameters in the search space based on the optimization algorithm. 8 . The apparatus of claim 7 , wherein the enumerator repeats the selective enumeration of the combinations of the hyperparameters in the search space until a convergence is reached based on the optimization algorithm. 9 . The apparatus of claim 3 , wherein the operation includes one or more arguments for: a ratio of the training and scoring data split used for verification; a k-fold value for cross-validation of the training and scoring data; and the goodness metrics used for comparison of predicted and actual values for the training and scoring data used by the model. 10 . A computer-implemented method, comprising: (a) executing a relational database management system (RDBMS) in a computer system comprised of a plurality of compute units, wherein the RDBMS manages a relational database comprised of one or more tables storing data; (b) tuning hyperparameters for a machine learning model in one or more of the compute units, wherein the hyperparameters are control parameters used in construction of the model, and the tuning of the hyperparameters is implemented as an operation in the RDBMS that accepts training and scoring data for the model, constructs the model using the hyperparameters and the training data, and generates goodness metrics for the model using the scoring data. 11 . The method of claim 10 , wherein a search space for the hyperparameters is defined by one or more combinations of the hyperparameters, and the search space is partitioned across the compute units to parallelize the tuning of the hyperparameters. 12 . The method of claim 11 , wherein: an enumerator enumerates the combinations of hyperparameters in the search space based on an optimization algorithm; and a function evaluator computes an objective function for the combinations of the hyperparameters enumerated in the search space, wherein the objective function computes one or more goodness metrics for the model generated using one or more of the combinations of the hyperparameters enumerated in the search space, to identify an optimal one of the combinations of the hyperparameters. 13 . The method of claim 12 , wherein a plurality of the compute units perform the function evaluator concurrently using the search space that is partitioned across the compute units. 14 . The method of claim 12 , wherein the operation includes training and scoring functions used for computation of the objective function. 15 . The method of claim 14 , wherein the training data is used by the training function to train the model generated using the hyperparameters; the scoring data is used by the scoring function to score the model trained by the training function; and the goodness metrics are used to evaluate the model scored by the scoring function. 16 . The method of claim 12 , wherein the operation includes one or more optimization algorithms for the enumerator, and the enumerator performs a selective enumeration of the combinations of the hyperparameters in the search space based on the optimization algorithm. 17 . The method of claim 16 , wherein the enumerator repeats the selective enumeration of the combinations of the hyperparameters in the search space until a convergence is reached based on the optimization algorithm. 18 . The method of claim 12 , wherein the operation includes one or more arguments for: a ratio of the training and scoring data split used for verification; a k-fold value for cross-validation of the training and scoring data; and the goodness metrics used for comparison of predicted and actual values for the training and scoring data used by the model. 19 . A computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer system to cause the computer system to perform a method, comprising: (a) executing a relational database management system (RDBMS) in a computer system comprised of a plurality of compute units, wherein the RDBMS manages a relational database comprised of one or more tables storing data; (b) tuning hyperparameters for a machine learning model in one or more of the compute units, wherein the hyperparameters are control parameters used in construction of the model, and the tuning of the hyperparameters is implemented as an operation in the RDBMS that accepts training and scoring data for the model, constructs the model using the hyperparameters and the training data, and generates goodness metrics for the model using the scoring data. 20 . The computer program product of claim 19 , wherein a search space for the hyperparameters is defined by one or more combinations of the hyperparameters, and the search space is partitioned across the compute units to parallelize the tuning of the hyperparameters.

Assignees

Inventors

Classifications

  • Relational databases · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Integrating or interfacing systems involving database management systems · CPC title

  • G06N3/126Primary

    Evolutionary algorithms, e.g. genetic algorithms or genetic programming · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021397975A1 cover?
Hyperparameter tuning for a machine learning model is performed in a massively parallel database system. A computer system comprised of a plurality of compute units executes a relational database management system (RDBMS), wherein the RDBMS manages a relational database comprised of one or more tables storing data. One or more of the compute units perform the hyperparameter tuning for the machi…
Who is the assignee on this patent?
Teradata Us Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).