Learned resource consumption model for optimizing big data queries
US-2020349161-A1 · Nov 5, 2020 · US
US11567916B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11567916-B2 |
| Application number | US-202016813873-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 10, 2020 |
| Priority date | Mar 10, 2020 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An approach is provided for evaluating a performance of a query. A risk of selecting a low performance access path for a query is determined. The risk is determined to exceed a risk threshold. Based on the risk exceeding the risk threshold and using a machine learning optimizer, first costs of access paths for the query are determined. Using a cost-based database optimizer, second costs of the access paths are determined. Using a strong classifier operating on the first costs and the second costs, a final access path for the query is selected from the access paths.
Opening claim text (preview).
What is claimed is: 1. A method of evaluating a performance of a query, the method comprising: determining, by one or more processors, a risk of selecting an access path for a query which provides a performance of the query that does not exceed a performance threshold, wherein the determining the risk is based on real count information, explain tables, and a machine learning model, wherein the real count information includes result rows of clauses of the query, the result rows including (i) an amount of rows that are qualified after applying a predicate and (ii) an amount of rows that are returned after tables in the query are joined, wherein the explain tables include information about a performance of SQL statements and functions included in an execution of the query, and wherein the machine learning model is trained for predicting a performance of the access path and is based on the real count information and the explain tables; determining, by the one or more processors, that the risk exceeds a risk threshold; based on the risk exceeding the risk threshold and using a machine learning optimizer that employs a machine learning system, determining, by the one or more processors, first costs of access paths for the query; using a cost-based database optimizer, determining, by the one or more processors, second costs of the access paths for the query; and using a strong classifier operating on the first costs and the second costs, selecting, by the one or more processors, a final access path for the query from the access paths. 2. The method of claim 1 , further comprising: performing, by the one or more processors, the determining the risk and the determining that the risk exceeds the risk threshold by using a potential error module; and sending the final access path as feedback to enhance the potential error module and the machine learning system. 3. The method of claim 1 , further comprising: prior to the determining the risk, receiving, by the one or more processors, the query; and parsing, by the one or more processors, the query, wherein the query in the determining the risk, the determining the first costs, the determining the second costs, and the selecting the final access path is the parsed query. 4. The method of claim 1 , further comprising: receiving and parsing, by the one or more processors, a second query; determining, by the one or more processors and a potential error module, a second risk of selecting a second access path for the parsed second query which provides a second performance of the parsed second query that does not exceed the performance threshold; determining, by the one or more processors, that the second risk does not exceed the risk threshold; based on the second risk not exceeding the risk threshold, using the cost-based database optimizer, and without using the machine learning system, determining, by the one or more processors, third costs of second access paths for the parsed second query; and based on the third costs and without using the strong classifier, selecting, by the one or more processors, a second final access path for the second query from the second access paths. 5. The method of claim 1 , further comprising providing a first performance of the query using the final access path that exceeds a second performance of the query using another access path determined by the cost-based database optimizer, without using the machine learning system, and without using the strong classifier. 6. The method of claim 1 , wherein the determining the risk includes: receiving historical training data for a risk prediction model; and based on the historical training data and using the risk prediction model and a logical classifier, the machine learning optimizer determining the risk of selecting the access path for the query which provides the performance of the query that does not exceed the performance threshold. 7. The method of claim 1 , wherein the selecting the final access path for the query includes employing a machine learning algorithm that uses a boosted classifier to select the final access path based on a combination of the first costs and the second costs. 8. The method of claim 1 , further comprising: providing at least one support service for at least one of creating, integrating, hosting, maintaining, and deploying computer readable program code in the computer, the program code being executed by a processor of the computer to implement the determining the risk, the determining that the risk exceeds the risk threshold, the determining the first costs of the access paths, the determining the second costs of the access paths, and the selecting the final access path. 9. A computer program product comprising: a computer readable storage medium having computer readable program code stored on the computer readable storage medium, the computer readable program code being executed by a central processing unit (CPU) of a computer system to cause the computer system to perform a method of evaluating a performance of a query, the method comprising the steps of: the computer system determining a risk of selecting an access path for a query which provides a performance of the query that does not exceed a performance threshold, wherein the determining the risk is based on real count information, explain tables, and a machine learning model, wherein the real count information includes result rows of clauses of the query, the result rows including (i) an amount of rows that are qualified after applying a predicate and (ii) an amount of rows that are returned after tables in the query are joined, wherein the explain tables include information about a performance of SQL statements and functions included in an execution of the query, and wherein the machine learning model is trained for predicting a performance of the access path and is based on the real count information and the explain tables; the computer system determining that the risk exceeds a risk threshold; based on the risk exceeding the risk threshold and using a machine learning optimizer that employs a machine learning system, the computer system determining first costs of access paths for the query; using a cost-based database optimizer, the computer system determining second costs of the access paths for the query; and using a strong classifier operating on the first costs and the second costs, the computer system selecting a final access path for the query from the access paths. 10. The computer program product of claim 9 , wherein the method further comprises: the computer system performing the determining the risk and the determining that the risk exceeds the risk threshold by using a potential error module; and the computer system sending the final access path as feedback to enhance the potential error module and the machine learning system. 11. The computer program product of claim 9 , wherein the method further comprises: prior to the determining the risk, the computer system receiving the query; and the computer system parsing the query, wherein the query in the determining the risk, the determining the first costs, the determining the second costs, and the selecting the final access path is the parsed query. 12. The computer program product of claim 9 , wherein the method further comprises: the computer system receiving and parsing a second query; the computer system determining, by a potential error module, a second risk of selecting a second access path for the parsed second query which provides a second performance of the parsed second query that does not exceed the performance threshold; the computer system determining that the second r
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Access plan code generation and invalidation; Reuse of access plans · CPC title
Machine learning · CPC title
Feedforward networks · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.