Estimating unique entry counts using a counting bloom filter

US9465826B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9465826-B2
Application numberUS-201213686490-A
CountryUS
Kind codeB2
Filing dateNov 27, 2012
Priority dateNov 27, 2012
Publication dateOct 11, 2016
Grant dateOct 11, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of estimating a number of unique entry counts of an attribute in a database comprises, with a processor: identifying a sample of entries from an attribute database, determining frequencies of a number of input observations of the sample of entries, determining a number of high frequency values of the sample of entries, and estimating a number of unique entry counts of an attribute within the attribute database using a counting Bloom filter and based on the frequencies of the input observations and the high frequency values.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of estimating a number of unique entry counts of an attribute in a database comprising, with a processor: identifying a sample of entries from an attribute database; determining frequencies of a number of input observations of the sample of entries, the determining comprising: retrieving a subsequent observation o of the number of input observations, computing a hash function, h i (o), for i=1 to k, determining if all the corresponding k bits in the counting Bloom filter are set to 1, and if all the corresponding k bits in the counting Bloom filter are not set to 1, then incrementing a UEC count if the subsequent observation o has not been seen before; determining a number of high frequency values of the sample of entries: and estimating a number of unique entry counts of a n attribute within the attribute database using a counting Bloom filter and based on the frequencies of the input observations and the high frequency values, wherein the counting Bloom filter comprises a plurality of counters, the plurality of counters comprisin g a counter for each bit of the counting Bloom filter. 2. The method of claim 1 , further comprising: retrieving a table having those attributes from a source; passing the observation through a counting Bloom filter for a first observation O; and updating the corresponding k counters. 3. The method of claim 1 , further comprising: incrementing a number of corresponding k counters; returning a value for frequency f; determining if the frequency f is less than FREQSIZE; if the frequency f is less than FREQSIZE, then updating a counter for the respective frequency of frequency, and retrieve the next observation. 4. The method of claim 1 , in which the method is performed for a static UEC computation, a dynamic UEC computation, or combinations thereof. 5. The method of claim 1 , further comprising determining a false positive probability based on a number of hash functions used to determine if an input value has been seen before, a number of entries in the sample of entries, and a bit vector representing the counting Bloom filter. 6. The method of claim 1 , further comprising utilizing a number of hash functions to determine if an input value has been seen before to determine uniqueness of a value using the counting Bloom filter. 7. The method of claim 1 , in which determining a number of high frequency values of the sample of entries is performed using the counting Bloom filter. 8. The method of claim 7 , further comprising maintaining a synopsis table. 9. A database management system (DBMS) for estimating a number of unique entry counts of an attribute in a database, comprising: a processor; and a memory device communicatively coupled to the processor, the memory device comprising: a frequency of frequencies module to, when executed by the processor, determine frequencies of a number of input observations, the determining comprising: retrieving a subsequent observation o of the number of input observations, computing a hash function, h i (o), for i=1 to k, determining if all the corresponding k bits in the counting Bloom filter are set to 1, and if all the corresponding k bits in the counting Bloom filter are not set to 1, then incrementing a UEC count if the subsequent observation o has not been seen before; a skewness module to, when executed by the processor, determine a number of high frequency values; and a unique entry count (UEC) estimating module to, when executed by the processor, estimate a number of unique entry counts of an attribute in an attribute database based on the frequencies of the input observations and the high frequency values using the counting Bloom filter, wherein the counting Bloom filter comprises a plurality of counters, the plurality of counters comprising a counter for each bit of the counting Bloom filter. 10. The DBMS of claim 9 , further comprising a n output device to output a representation of a number of histograms, in which the histograms define the unique entry count (UEC) of attribute in a n attribute database. 11. The DBMS of claim 9 , further comprising a synopsis table stored within the memory device, in which the synopsis table stores skew values and their corresponding observations and frequencies. 12. The DBMS of claim 9 , further comprising a frequency of frequencies database stored within the memory device, in which the frequency of frequencies database stores values obtained from the frequency of frequencies module. 13. A computer program product for determining a number of frequencies of frequencies values and a number of skew values estimating a number of unique entry counts of a n attribute in a database, the computer program product-comprising: a non-transitory computer readable medium comprising computer usable program code embodied therewith, the computer usable program code to, when executed by a processor: retrieve a subsequent observation of a number of input observations; compute a hash function, h i (o) for i=1 to k; determine if all the corresponding k bits in a counting Bloom filter are set to 1; increment a unique entrysount if the subsequent observation has not been seen before and if all the corresponding k bits in the counting Bloom filter are not set to 1; and determining a false positive probability based on the k bits in the counting Bloom filter, an n number of entries in a sample of entries of the attribute, and an m bit vector representing the counting Bloom filter. 14. The computer program product of claim 13 , further comprising: computer usable program code to, when executed by a processor, retrieve a table having those attributes from a source; computer usable program code to, when executed by a processor, pass the observation through a Bloom filter for a first observation O; and computer usable program code to, when executed by a processor, update corresponding counters. 15. The computer program product of claim 13 , further comprising: computer usable program code to, when executed by a processor, increment a number of corresponding k counters; computer usable program code to, when executed by a processor, return a value for frequency f; computer usable program code to, when executed by a processor, determine if the frequency f is less than FREQSIZE; computer usable program code to, when executed by a processor, update a frequency of frequency counter if the frequency f is less than FREQSIZE; and computer usable program code to, when executed by a processor, retrieve a next observation. 16. The computer program product of claim 13 , wherein the counting Bloom filter comprises a plurality of counters, the plurality of counters comprising a counter for each bit of the counting Bloom filter.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9465826B2 cover?
A method of estimating a number of unique entry counts of an attribute in a database comprises, with a processor: identifying a sample of entries from an attribute database, determining frequencies of a number of input observations of the sample of entries, determining a number of high frequency values of the sample of entries, and estimating a number of unique entry counts of an attribute with…
Who is the assignee on this patent?
Hewlett Packard Development Co Lp, Hewlett Packard Entpr Dev Lp
What technology area does this patent fall under?
Primary CPC classification G06F17/30306. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).