Methods and systems for transforming distributed database structure for reduced compute load
US-2024330289-A1 · Oct 3, 2024 · US
US2016110417A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016110417-A1 |
| Application number | US-201314787326-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 30, 2013 |
| Priority date | Apr 30, 2013 |
| Publication date | Apr 21, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Incrementally updating statistics includes sampling rows from a database column in a database to generate a first sample, sampling a subset of modified rows from the database column after generating the first sample to generate a second sample, determining whether distribution changes occurred to the database column based on the first and second samples, and updating a database statistic about the database column in response to determining that a distribution change exists.
Opening claim text (preview).
What is claimed is: 1 . A method for incrementally updating statistics, comprising: sampling rows from a database column in a database to generate a first sample; sampling a subset of modified rows from the database column after the first sample was generated to generate a second sample; determining whether distribution changes occurred to the database column based on the first sample and the second sample; and updating a database statistic accordingly about the database column in response to determining that a distribution change exists. 2 . The method of claim 1 , wherein the database statistics comprise the unique entry count, the row count, the frequency of frequencies per histogram interval, and the frequency of frequencies for the entire histogram statistics. 3 . The method of claim 1 , wherein determining whether the distribution changes occurred to the database column based on the first sample and the second sample includes determining a confidence level associated with a conclusion that the distribution changes exists using a statistical test. 4 . The method of claim 3 , wherein the statistical test is based on comparing statistical means in two time periods and similarly statistical variances in two time periods. 5 . The method of claim 3 , wherein the statistical test is a paired t-test when the samples are deemed correlated. 6 . The method of claim 3 , wherein the statistical test is a non-parametric test, a Kolmogorov-Smirnov test, another statistical test, or combinations thereof when the distributions of the data fail to conform to normal distribution assumptions. 7 . The method of claim 3 , wherein updating the database statistics about the database column in response to determining whether the distribution change exists includes updating the database statistic in response to the confidence level being greater than a predetermined confidence level threshold. 8 . The method of claim 1 , wherein the first sample is at least one percent of the rows in the database column. 9 . The method of claim 1 , further comprising performing a query plan optimization task based on the database statistic. 10 . The method of claim 1 , wherein sampling the subset of the modified rows from the database column to generate the second sample includes generating the second sample by combining the first sample with deletions and inserts from the subset of the modified rows. 11 . A system for incrementally updating statistics, comprising: a sampling engine to sample rows from a database column in a database to generate a first sample and to sample modified rows from the database column after generating the first sample to generate a second sample; a determination engine to determine whether distribution changes occurred based on the first sample and the second sample; and an updating engine to update the unique entry count, the row count and the frequency of frequencies for a histogram interval and for the entire histogram statistics about the database column in response to determining that a distribution change exists. 12 . The system of claim 11 , further comprising a confidence engine to determine a confidence level of a conclusion that a distribution change exists. 13 . The system of claim 11 , wherein the confidence engine to further use statistical tests to determine the confidence level. 14 . The system of claim 11 , wherein the statistical test includes a two sample t-test, paired t-test, a non-parametric test, a Kolmogorov-Smirnov test, a test based on comparing statistical means of two time periods against each other and a statistical variance of two time periods against each other or combinations thereof. 15 . A computer program product for incrementally updating statistics, comprising: a non-transitory computer readable storage medium, the non-transitory computer readable storage medium comprising computer readable program code embodied therewith, the computer readable program code comprising program instructions that, when executed, causes a processor to: sample rows from a database column in a database to generate a first sample; sample a subset of the rows, the subset including deleted rows, inserted rows, and updated rows, from the database column after generating the first sample to generate a second sample; build a bloom filter that represents rows of the first sample and the second sample; determine whether distribution changes occurred to the subset based on the first sample and the second sample; and update the unique entry count, the row count and the frequency of frequencies for a histogram interval and for the entire histogram statistics about the database column in response to determining a distribution change exists.
Specific adaptations of the file system to access devices and non-file objects via standard file system access operations, e.g. pseudo file systems (dedicated interfaces to storage systems G06F3/0601) · CPC title
Plan optimisation · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Updating · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.