Automated configuration parameter tuning for database performance

US11567937B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11567937-B2
Application numberUS-202117318972-A
CountryUS
Kind codeB2
Filing dateMay 12, 2021
Priority dateOct 18, 2018
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters. The optimal set of configuration parameter values is automatically applied for the given workload.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-executed method comprising: training one or more machine learning models, to produce one or more trained machine learning models, using a training corpus comprising performance metrics of database workloads; identifying, using the one or more trained machine learning models, one or more experimental values for one or more configuration parameters; automatically running a particular experiment by causing a database management system to manage a particular database workload based on the one or more experimental values for the one or more configuration parameters; and adding, to the training corpus to produce an updated training corpus, data from the particular experiment that comprises one or more resulting performance metrics from the particular experiment; wherein the method is performed by one or more computing devices. 2. The computer-executed method of claim 1 , wherein said data from the particular experiment further comprises the one or more experimental values for the one or more configuration parameters. 3. The computer-executed method of claim 1 , further comprising: scheduling, in scheduling information, the particular experiment to be run on one or more particular computing devices; wherein automatically running the particular experiment is performed in response to at least a particular computing device of the one or more particular computing devices running the particular experiment according to the scheduling information. 4. The computer-executed method of claim 3 , wherein: at the time of scheduling, a second experiment is scheduled, in the scheduling information, to be run on the one or more particular computing devices; the method further comprises: determining that the particular experiment is a higher-priority experiment than the second experiment; wherein said scheduling comprises scheduling the particular experiment to be run on the one or more particular computing devices before the second experiment based on said determining that the particular experiment is a higher-priority experiment than the second experiment. 5. The computer-executed method of claim 1 , wherein: the particular experiment is run on a particular computing device of a plurality of computing devices; and the method further comprises: automatically running a second experiment on a second computing device of the plurality of computing devices, wherein the second experiment is run, at least partially, in parallel with running the particular experiment. 6. The computer-executed method of claim 5 , further comprising: maintaining historical data regarding experiments that have been run on the plurality of computing devices; wherein the historical data comprises one or more of: experiment running time, experiment resource requirements, or trends in resource utilization for the plurality of computing devices; based on the historical data, rebalancing a plurality of experiments, running on the plurality of computing devices, comprising the particular experiment and the second experiment; wherein said rebalancing the plurality of experiments comprises moving execution of at least one experiment, of the plurality of experiments, to one or more particular computing devices, of the plurality of computing devices, that include at least one different computing device than one or more previous computing devices executing the at least one experiment. 7. The computer-executed method of claim 1 , further comprising: wherein the particular database workload is one of a plurality of workloads available for running experiments; wherein each workload, of the plurality of workloads, is associated with metadata that characterizes a type of said each workload; prior to running the particular experiment: determining, using the one or more trained machine learning models, a selected type of workload for the particular experiment, and selecting the particular database workload based on the particular database workload being associated with metadata indicating the selected type of workload. 8. The computer-executed method of claim 7 , wherein the one or more trained machine learning models determine the selected type of workload for the particular experiment based on a lack of information for the selected type of workload in the training corpus. 9. The computer-executed method of claim 1 , further comprising retraining the one or more machine learning models, to produce one or more updated trained machine learning models, using the updated training corpus. 10. The computer-executed method of claim 1 , wherein identifying the one or more experimental values for the one or more configuration parameters is based on determining that historical changes to the one or more configuration parameters had an impact on one or more performance metrics that is over a threshold amount of change. 11. One or more non-transitory computer-readable media storing one or more sequences of instructions that, when executed by one or more processors, cause: training one or more machine learning models, to produce one or more trained machine learning models, using a training corpus comprising performance metrics of database workloads; identifying, using the one or more trained machine learning models, one or more experimental values for one or more configuration parameters; automatically running a particular experiment by causing a database management system to manage a particular database workload based on the one or more experimental values for the one or more configuration parameters; and adding, to the training corpus to produce an updated training corpus, data from the particular experiment that comprises one or more resulting performance metrics from the particular experiment. 12. The one or more non-transitory computer-readable media of claim 11 , wherein said data from the particular experiment further comprises the one or more experimental values for the one or more configuration parameters. 13. The one or more non-transitory computer-readable media of claim 11 , wherein the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: scheduling, in scheduling information, the particular experiment to be run on one or more particular computing devices; wherein automatically running the particular experiment is performed in response to at least a particular computing device of the one or more particular computing devices running the particular experiment according to the scheduling information. 14. The one or more non-transitory computer-readable media of claim 13 , wherein: at the time of scheduling, a second experiment is scheduled, in the scheduling information, to be run on the one or more particular computing devices; the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: determining that the particular experiment is a higher-priority experiment than the second experiment; wherein said scheduling comprises scheduling the particular experiment to be run on the one or more particular computing devices before the second experiment based on said determining that the particular experiment is a higher-priority experiment than the second experiment. 15. The one or more non-transitory computer-readable media of claim 11 , wherein: the particular experiment is run on a particular computing device of a plurality of computing devices; and the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cau

Assignees

Inventors

Classifications

  • G06F16/217Primary

    Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

  • Selectivity estimation or determination · CPC title

  • Machine learning · CPC title

  • Ensemble learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11567937B2 cover?
Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specificall…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/217. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).