Automated configuration parameter tuning for database performance
US-11061902-B2 · Jul 13, 2021 · US
US11567937B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11567937-B2 |
| Application number | US-202117318972-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 12, 2021 |
| Priority date | Oct 18, 2018 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters. The optimal set of configuration parameter values is automatically applied for the given workload.
Opening claim text (preview).
What is claimed is: 1. A computer-executed method comprising: training one or more machine learning models, to produce one or more trained machine learning models, using a training corpus comprising performance metrics of database workloads; identifying, using the one or more trained machine learning models, one or more experimental values for one or more configuration parameters; automatically running a particular experiment by causing a database management system to manage a particular database workload based on the one or more experimental values for the one or more configuration parameters; and adding, to the training corpus to produce an updated training corpus, data from the particular experiment that comprises one or more resulting performance metrics from the particular experiment; wherein the method is performed by one or more computing devices. 2. The computer-executed method of claim 1 , wherein said data from the particular experiment further comprises the one or more experimental values for the one or more configuration parameters. 3. The computer-executed method of claim 1 , further comprising: scheduling, in scheduling information, the particular experiment to be run on one or more particular computing devices; wherein automatically running the particular experiment is performed in response to at least a particular computing device of the one or more particular computing devices running the particular experiment according to the scheduling information. 4. The computer-executed method of claim 3 , wherein: at the time of scheduling, a second experiment is scheduled, in the scheduling information, to be run on the one or more particular computing devices; the method further comprises: determining that the particular experiment is a higher-priority experiment than the second experiment; wherein said scheduling comprises scheduling the particular experiment to be run on the one or more particular computing devices before the second experiment based on said determining that the particular experiment is a higher-priority experiment than the second experiment. 5. The computer-executed method of claim 1 , wherein: the particular experiment is run on a particular computing device of a plurality of computing devices; and the method further comprises: automatically running a second experiment on a second computing device of the plurality of computing devices, wherein the second experiment is run, at least partially, in parallel with running the particular experiment. 6. The computer-executed method of claim 5 , further comprising: maintaining historical data regarding experiments that have been run on the plurality of computing devices; wherein the historical data comprises one or more of: experiment running time, experiment resource requirements, or trends in resource utilization for the plurality of computing devices; based on the historical data, rebalancing a plurality of experiments, running on the plurality of computing devices, comprising the particular experiment and the second experiment; wherein said rebalancing the plurality of experiments comprises moving execution of at least one experiment, of the plurality of experiments, to one or more particular computing devices, of the plurality of computing devices, that include at least one different computing device than one or more previous computing devices executing the at least one experiment. 7. The computer-executed method of claim 1 , further comprising: wherein the particular database workload is one of a plurality of workloads available for running experiments; wherein each workload, of the plurality of workloads, is associated with metadata that characterizes a type of said each workload; prior to running the particular experiment: determining, using the one or more trained machine learning models, a selected type of workload for the particular experiment, and selecting the particular database workload based on the particular database workload being associated with metadata indicating the selected type of workload. 8. The computer-executed method of claim 7 , wherein the one or more trained machine learning models determine the selected type of workload for the particular experiment based on a lack of information for the selected type of workload in the training corpus. 9. The computer-executed method of claim 1 , further comprising retraining the one or more machine learning models, to produce one or more updated trained machine learning models, using the updated training corpus. 10. The computer-executed method of claim 1 , wherein identifying the one or more experimental values for the one or more configuration parameters is based on determining that historical changes to the one or more configuration parameters had an impact on one or more performance metrics that is over a threshold amount of change. 11. One or more non-transitory computer-readable media storing one or more sequences of instructions that, when executed by one or more processors, cause: training one or more machine learning models, to produce one or more trained machine learning models, using a training corpus comprising performance metrics of database workloads; identifying, using the one or more trained machine learning models, one or more experimental values for one or more configuration parameters; automatically running a particular experiment by causing a database management system to manage a particular database workload based on the one or more experimental values for the one or more configuration parameters; and adding, to the training corpus to produce an updated training corpus, data from the particular experiment that comprises one or more resulting performance metrics from the particular experiment. 12. The one or more non-transitory computer-readable media of claim 11 , wherein said data from the particular experiment further comprises the one or more experimental values for the one or more configuration parameters. 13. The one or more non-transitory computer-readable media of claim 11 , wherein the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: scheduling, in scheduling information, the particular experiment to be run on one or more particular computing devices; wherein automatically running the particular experiment is performed in response to at least a particular computing device of the one or more particular computing devices running the particular experiment according to the scheduling information. 14. The one or more non-transitory computer-readable media of claim 13 , wherein: at the time of scheduling, a second experiment is scheduled, in the scheduling information, to be run on the one or more particular computing devices; the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cause: determining that the particular experiment is a higher-priority experiment than the second experiment; wherein said scheduling comprises scheduling the particular experiment to be run on the one or more particular computing devices before the second experiment based on said determining that the particular experiment is a higher-priority experiment than the second experiment. 15. The one or more non-transitory computer-readable media of claim 11 , wherein: the particular experiment is run on a particular computing device of a plurality of computing devices; and the one or more sequences of instructions further comprise instructions that, when executed by one or more processors, cau
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Selectivity estimation or determination · CPC title
Machine learning · CPC title
Ensemble learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.