Automated configuration parameter tuning for database performance

US11061902B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11061902-B2
Application numberUS-201916298837-A
CountryUS
Kind codeB2
Filing dateMar 11, 2019
Priority dateOct 18, 2018
Publication dateJul 13, 2021
Grant dateJul 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters. The optimal set of configuration parameter values is automatically applied for the given workload.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-executed method comprising: extracting one or more workload-specific features of a particular database workload running at a database system; wherein the one or more workload-specific features of the particular database workload characterize utilization of the particular database workload; based, at least in part, on the one or more workload-specific features of the particular database workload, identifying, using one or more trained machine learning models, a particular set of configuration-specific features for the particular database workload to optimize one or more performance metrics; wherein the particular set of configuration-specific features comprises values for a set of configuration parameters of the database system for the particular database workload; and automatically applying the particular set of configuration-specific features to the database system for the particular database workload; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , wherein the one or more trained machine learning models are one or more first trained machine learning models, the method further comprising: identifying, using one or more second trained machine learning models, a set of impactful configuration parameters for a particular database management system; wherein identifying the particular set of configuration-specific features comprises: testing, using the one or more first trained machine learning models, various values for each of the set of impactful configuration parameters, and during the testing, keeping constant values for configuration parameters other than those in the set of impactful configuration parameters. 3. The method of claim 1 , further comprising identifying the particular set of configuration-specific features from among a plurality of candidate sets of configuration-specific features for the particular database workload using one or more optimization strategies from a group of optimization strategies that comprises: random search, grid search, and Bayesian optimization. 4. The method of claim 1 , further comprising: determining whether one or more predicted performance metrics, predicted for the particular database workload given the particular set of configuration-specific features, satisfy one or more target performance metrics for the particular database workload; wherein said automatically applying the particular set of configuration-specific features to the database system for the particular database workload is performed in response to determining that the one or more predicted performance metrics satisfy the one or more target performance metrics for the particular database workload. 5. The method of claim 1 , wherein: the particular set of configuration-specific features comprises one or more of: an amount of memory available for the particular database workload, types of joins allowed to be used for queries over the particular database workload, kinds of operators that are allowed to be used for the particular database workload, or whether indexes are enabled for the particular database workload; and the one or more workload-specific features comprise one or more of: a number of different kinds of operations being used in queries over a dataset of the particular database workload, an extent of the queries over the dataset, binary lengths of records being inserted into the dataset, data types being used in queries over the dataset, a number of joins in queries over the dataset, or aggregation-type operations being performed over the dataset. 6. The method of claim 1 , wherein: the one or more workload-specific features comprise one or more simple workload-specific features comprising: a number of different kinds of operations being used in queries over a dataset of the particular database workload, an extent of the queries over the dataset, binary lengths of records being inserted into the dataset, data types being used in queries over the dataset, a number of joins in queries over the dataset, or aggregation-type operations being performed over the dataset; and the one or more workload-specific features further comprise an aggregate workload-specific feature that represents a particular ratio of two simple workload-specific features. 7. The method of claim 1 , wherein said steps of identifying and applying are performed in response to one of: determining that one or more workload-specific features of the particular database workload have changed; or determining that a particular performance metric of the particular database workload does not satisfy a corresponding performance requirement for the particular database workload. 8. The method of claim 1 , further comprising training one or more machine learning models, to produce the one or more trained machine learning models, using a training corpus with data regarding a plurality of historical database workloads, including configuration-specific features, workload-specific features, and performance metrics of the plurality of historical database workloads. 9. The method of claim 8 , further comprising, prior to training the one or more machine learning models, automatically identifying one or more features, from the training corpus, to eliminate from use in training the one or more machine-learning models. 10. The method of claim 8 , further comprising: prior to training the one or more machine learning models, automatically identifying a particular type of machine learning model, from a plurality of types of machine-learning models, to use for the one or more machine learning models; wherein the one or more trained machine learning models are of the particular type of machine learning model. 11. The method of claim 8 , further comprising, prior to training the one or more machine learning models, automatically identifying one or more hyper-parameters to use for the one or more machine-learning models. 12. A computer-executed method comprising: training one or more machine learning models, to produce one or more trained machine learning models, using a training corpus with data regarding database workloads, including benchmarking features and performance metrics of the database workloads; identifying, using the one or more trained machine learning models, one or more experimental values for one or more configuration parameters; automatically running a particular experiment by causing a database management system to manage a particular database workload based on the one or more experimental values for the one or more configuration parameters; and adding, to the training corpus to produce an updated training corpus, data from the particular experiment that comprises one or more resulting performance metrics from the particular experiment and the one or more experimental values for the one or more configuration parameters; wherein the method is performed by one or more computing devices. 13. The method of claim 12 , further comprising: scheduling, in scheduling information, the particular experiment to be run on one or more particular computing devices; wherein automatically running the particular experiment is performed in response to a particular computing device of the one or more particular computing devices running the particular experiment according to the scheduling information. 14. The method of claim 12 , wherein: the particular experiment is run on a particular computing device of a plurality of computing devices; and the method further comprises: automatically running a second experiment on a second computing

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11061902B2 cover?
Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specificall…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/217. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).