Incrementally Updating Statistics

US2016110417A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016110417-A1
Application numberUS-201314787326-A
CountryUS
Kind codeA1
Filing dateApr 30, 2013
Priority dateApr 30, 2013
Publication dateApr 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Incrementally updating statistics includes sampling rows from a database column in a database to generate a first sample, sampling a subset of modified rows from the database column after generating the first sample to generate a second sample, determining whether distribution changes occurred to the database column based on the first and second samples, and updating a database statistic about the database column in response to determining that a distribution change exists.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for incrementally updating statistics, comprising: sampling rows from a database column in a database to generate a first sample; sampling a subset of modified rows from the database column after the first sample was generated to generate a second sample; determining whether distribution changes occurred to the database column based on the first sample and the second sample; and updating a database statistic accordingly about the database column in response to determining that a distribution change exists. 2 . The method of claim 1 , wherein the database statistics comprise the unique entry count, the row count, the frequency of frequencies per histogram interval, and the frequency of frequencies for the entire histogram statistics. 3 . The method of claim 1 , wherein determining whether the distribution changes occurred to the database column based on the first sample and the second sample includes determining a confidence level associated with a conclusion that the distribution changes exists using a statistical test. 4 . The method of claim 3 , wherein the statistical test is based on comparing statistical means in two time periods and similarly statistical variances in two time periods. 5 . The method of claim 3 , wherein the statistical test is a paired t-test when the samples are deemed correlated. 6 . The method of claim 3 , wherein the statistical test is a non-parametric test, a Kolmogorov-Smirnov test, another statistical test, or combinations thereof when the distributions of the data fail to conform to normal distribution assumptions. 7 . The method of claim 3 , wherein updating the database statistics about the database column in response to determining whether the distribution change exists includes updating the database statistic in response to the confidence level being greater than a predetermined confidence level threshold. 8 . The method of claim 1 , wherein the first sample is at least one percent of the rows in the database column. 9 . The method of claim 1 , further comprising performing a query plan optimization task based on the database statistic. 10 . The method of claim 1 , wherein sampling the subset of the modified rows from the database column to generate the second sample includes generating the second sample by combining the first sample with deletions and inserts from the subset of the modified rows. 11 . A system for incrementally updating statistics, comprising: a sampling engine to sample rows from a database column in a database to generate a first sample and to sample modified rows from the database column after generating the first sample to generate a second sample; a determination engine to determine whether distribution changes occurred based on the first sample and the second sample; and an updating engine to update the unique entry count, the row count and the frequency of frequencies for a histogram interval and for the entire histogram statistics about the database column in response to determining that a distribution change exists. 12 . The system of claim 11 , further comprising a confidence engine to determine a confidence level of a conclusion that a distribution change exists. 13 . The system of claim 11 , wherein the confidence engine to further use statistical tests to determine the confidence level. 14 . The system of claim 11 , wherein the statistical test includes a two sample t-test, paired t-test, a non-parametric test, a Kolmogorov-Smirnov test, a test based on comparing statistical means of two time periods against each other and a statistical variance of two time periods against each other or combinations thereof. 15 . A computer program product for incrementally updating statistics, comprising: a non-transitory computer readable storage medium, the non-transitory computer readable storage medium comprising computer readable program code embodied therewith, the computer readable program code comprising program instructions that, when executed, causes a processor to: sample rows from a database column in a database to generate a first sample; sample a subset of the rows, the subset including deleted rows, inserted rows, and updated rows, from the database column after generating the first sample to generate a second sample; build a bloom filter that represents rows of the first sample and the second sample; determine whether distribution changes occurred to the subset based on the first sample and the second sample; and update the unique entry count, the row count and the frequency of frequencies for a histogram interval and for the entire histogram statistics about the database column in response to determining a distribution change exists.

Assignees

Inventors

Classifications

  • Specific adaptations of the file system to access devices and non-file objects via standard file system access operations, e.g. pseudo file systems (dedicated interfaces to storage systems G06F3/0601) · CPC title

  • Plan optimisation · CPC title

  • G06F17/18Primary

    for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • Updating · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016110417A1 cover?
Incrementally updating statistics includes sampling rows from a database column in a database to generate a first sample, sampling a subset of modified rows from the database column after generating the first sample to generate a second sample, determining whether distribution changes occurred to the database column based on the first and second samples, and updating a database statistic about …
Who is the assignee on this patent?
Hewlett Packard Development Co
What technology area does this patent fall under?
Primary CPC classification G06F16/24542. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).