Query optimization in hybrid DBMS

US11048701B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11048701-B2
Application numberUS-201615263946-A
CountryUS
Kind codeB2
Filing dateSep 13, 2016
Priority dateSep 13, 2016
Publication dateJun 29, 2021
Grant dateJun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is provided for generating statistical information for query optimization in a data processing system. The mechanism comprises a first database engine maintaining a current first dataset currently being stored, a second database engine maintaining a second dataset. The second dataset is generated from previous first datasets or from the previous first datasets and current first dataset, the previous first datasets being datasets that were previously maintained by the first database engine. The first database engine receives a database query for accessing the first dataset, the database query involving one or more attributes of the first data set. The first database engine generates a query execution plan for the database query on the first dataset using collected statistical information on at least the second dataset. The first database engine processes the database query according to the query execution plan.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising a computer readable storage medium having a computer readable program for generating statistical information for query optimization in a computing device, the computing device comprising a first database engine maintaining a current first dataset analytics accelerator maintaining a second dataset generated from previous first datasets stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive, at the first database engine, a database query for accessing the current first dataset, the database query involving one or more attributes of the current first data set; obtain, by the first database engine from the analytics accelerator, statistical information on one or more attributes of the current first dataset collected from the second dataset that is generated from previous first datasets, wherein the previous first datasets are datasets that were maintained previous to the current first dataset by the first database engine; generate, by the first database engine, a query execution plan for the database query on the current first dataset using the statistical information collected on the second dataset; and process, by the first database engine, the database query on the current first dataset according to the query execution plan. 2. The computer program product of claim 1 , wherein the computer readable program to collect the statistical information further causes the computing device to: receive by the first database engine from the analytics accelerator the statistical information. 3. The computer program product of claim 1 , wherein the computer readable program to collect the statistical information further causes the computing device to: receive, by the first database engine, from the analytics accelerator a random sample on the second dataset; and calculate, by the first database engine, the statistical information based on the random sample. 4. The computer program product of claim 3 , wherein the computer readable program causing the computing device to receive the random sample is performed in response to the computer readable program causing the computing device to send a request from the first database engine to the analytics accelerator. 5. The computer program product of claim 3 , wherein the computer readable program causing the computing device to receive the random sample is automatically performed on a predefined periodic basis. 6. The computer program product of claim 1 , wherein the current first dataset comprises records of a given table having a commit date after a predefined date and wherein the second dataset comprises records of the given table having a commit date before that predefined date. 7. The computer program product of claim 1 , wherein the current first dataset comprises records of a given table having an access frequency higher than a predefined access frequency threshold and wherein the second dataset comprises records of the given table having an access frequency smaller than the predefined access frequency threshold. 8. The computer program product of claim 1 , wherein the computer readable program further causes the computing device to: receive, by the analytics accelerator, another database query for accessing the second dataset; generate, by the analytics accelerator, a query execution plan for the other database query using the collected statistical information; and process, by the analytics accelerator, the other database query according to the query execution plan. 9. The computer program product of claim 1 , wherein the statistical information comprises at least one of: the number of distinct values of the one or more attributes; the cardinality of values of the one or more attributes: minimum and maximum values of the one or more attributes; the fraction of NULL values of the one or more attributes; histogram of values of the one or more attributes; or correlation factor between values of different attributes. 10. A data processing system for generating statistical information for query optimization, the system comprising a first database engine maintaining a current first dataset and an analytics accelerator maintaining a second dataset generated from previous first datasets, the system comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to: receive, at the first database engine, a database query for accessing the current first dataset, the database query involving one or more attributes of the current first data set; obtain, by the first database engine from the analytics accelerator, statistical information on one or more attributes of the current first dataset collected from the second dataset that is generated from previous first datasets, wherein the previous first datasets are datasets that were maintained previous to the current first dataset by the first database engine; generate, by the first database engine, a query execution plan for the database query on the current first dataset using the statistical information collected on the second dataset; and process, by the first database engine, the database query on the current first dataset according to the query execution plan. 11. The system of claim 10 , wherein the instructions to collect the statistical information further cause the processor to: receive by the first database engine from the analytics accelerator the statistical information. 12. The system of claim 10 , wherein the instructions to collect the statistical information further cause the processor to: receive, by the first database engine, from the analytics accelerator a random sample on the second dataset; and calculate, by the first database engine, the statistical information based on the random sample. 13. The system of claim 12 , wherein the instructions causing the processor to receive the random sample is performed in response to the instructions causing the processor to send a request from the first database engine to the analytics accelerator. 14. The system of claim 12 , wherein the instructions causing the processor to receive the random sample is automatically performed on a predefined periodic basis. 15. The system of claim 10 , wherein the current first dataset comprises records of a given table having a commit date after a predefined date and wherein the second dataset comprises records of the given table having a commit date before that predefined date. 16. The system of claim 10 , wherein the current first dataset comprises records of a given table having an access frequency higher than a predefined access frequency threshold and wherein the second dataset comprises records of the given table having an access frequency smaller than the predefined access frequency threshold. 17. The system of claim 10 , wherein the instructions further cause the processor to: receive, by the analytics accelerator, another database query for accessing the second dataset; generate, by the analytics accelerator, a query execution plan for the other database query using the collected statistical information; and process, by the analytics accelerator, the other database query according to the query execution plan. 18. The system of claim 10 , wherein the statistical information comprises at least one of: the number of distinct values of the one or more attributes; the cardinality of values of the one or more attri

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11048701B2 cover?
A mechanism is provided for generating statistical information for query optimization in a data processing system. The mechanism comprises a first database engine maintaining a current first dataset currently being stored, a second database engine maintaining a second dataset. The second dataset is generated from previous first datasets or from the previous first datasets and current first data…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/24542. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).