Managing database with counting bloom filters

US10452676B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10452676-B2
Application numberUS-201415115006-A
CountryUS
Kind codeB2
Filing dateJan 31, 2014
Priority dateJan 31, 2014
Publication dateOct 22, 2019
Grant dateOct 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of managing a database including creating an initial counting bloom filter (CBF) instance having an array of counters and hash functions that map an inserted value to the array of counters, and designating the initial CBF instance as a current CBF instance, and sequentially inserting each value of a sample data set of a table column into the hash functions of the current CBF instance and incrementing counters of the array of counters to which the value is mapped. The method further includes, prior to inserting each value into the hash functions of the current CBF instance, when a number of counters of the array of counters having non-zero values is at least at a threshold level, designating the current CBF instance as an old CBF instance, creating a new CBF instance having an array of counters and hash functions that map an inserted value to the array counters, and designating the new CBF instance as the current CBF instance.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of managing a database comprising: creating an initial counting bloom filter (CBF) instance having an array of counters and hash functions that map an inserted value to the array of counters; designating the initial CBF instance as a current CBF instance; sequentially inserting each value of a sample data set of a table column into the hash functions of the current CBF instance and incrementing counters of the array of counters to which the value is mapped, wherein prior to inserting each value, when a number of counters of the array of counters having non-zero values is at least at a threshold level: designating the current CBF instance as an old CBF instance; creating a new CBF instance having an array of counters and hash functions that map an inserted value to the array counters; and designating the new CBF instance as the current CBF instance. 2. The method of claim 1 , wherein prior to inserting each value, when the number of counters of the array of counters having non-zero values is below the threshold level, continue employing the current CBF instance without designating the current CBF instance as an old instance, without creating a new CBF instance, and without designating the new CBF instance as the current CBF instance. 3. The method of claim 1 , further including: identifying unique values and corresponding frequencies in the column table based on counter values of the counter arrays of the current and old CBF instances. 4. The method of claim 3 , further including: deeming a value of the sample data set to be a unique value when at least one of the counters to which the value is mapped by the hash functions is incremented from zero; and placing the unique value in a list data structure corresponding to the current CBF instance. 5. The method of claim 4 , wherein after all values of the sample data set have been inserted into a CBF instance, the method further including: for each list data structure, sequentially inserting each unique value into the hash functions of the corresponding CBF instance and taking the lowest value of the counters to which the unique value is mapped as the frequency of the unique value; identifying all unique values and corresponding frequencies in the sample data set; and extrapolating from the unique values and corresponding frequencies in the sample data set to determine frequencies of the unique values in the table column using an estimation tool. 6. The method of claim 3 , further including constructing an equal-height histogram for the table column from the identified unique values and corresponding frequencies. 7. The method of claim 6 , including: constructing an equal-width histogram by sorting the identified unique values into a number of intervals, each interval having a value range of a same magnitude, such that when the intervals are taken in ascending order, any unique value in a given interval is greater than any unique value in an immediately preceding interval; and converting the equal-width histogram to the equal-height histogram, the equal-height histogram having the same number of intervals as the equal-width histogram, by successively redistributing the unique values from the intervals of the equal-width histogram into intervals of the equal-height histogram such that all occurrences of a given unique value are included in a same interval and such that a total number of unique values in each interval of the equal-height histogram, except for the final interval, is at least equal to a desired number. 8. The method of claim 7 , prior to sorting the unique values to the equal-width histogram, encoding the unique values so that each unique value has a same length. 9. The method of claim 1 , wherein when each counter of an array of counters to which a value is mapped is at a maximum value, the method including: inserting the value in an overflow table of the CBF instance; and incrementing a corresponding counter in the overflow table to track the occurrence frequency of the value thereafter. 10. The method of claim 1 , wherein the number of counters in each array of counters of each CBF instance is based on an estimated number of unique values in the sample data set, and wherein the method includes: taking a pilot sample of the sample data set; applying an estimating tool to the pilot sample to estimate the number of unique values in the table column from the pilot sample. 11. A non-transitory computer readable storage medium having a set of machine readable instructions that, when executed, cause a database management system to: receive a list of unique values and corresponding occurrence frequencies present in a sample data set of a table column of the database; construct an equal-width histogram by sorting the identified unique values into a number of intervals, each interval having a value range of a same magnitude, such that when the intervals are taken in ascending order, any unique value in a given interval is greater than any unique value in an immediately preceding interval; and convert the equal-width histogram to the equal-height histogram, the equal-height histogram having the same number of intervals as the equal-width histogram, by successively sorting the unique values from the intervals of the equal-width histogram into intervals of the equal-height histogram such that all occurrences of a given unique value are included in a same interval and such that a total number of unique values in each interval of the equal-height histogram, except for the final interval, is at least equal to a desired number. 12. The non-transitory computer readable storage medium of claim 11 , the set of machine readable instructions further causing the database management system to encode the unique values in the list into fixed width values prior to constructing the equal-width histogram, wherein a lower end of the value range for the first interval of the equal-width histogram is that of the minimum unique value, and the upper end of the value range of the last interval of the equal-width histogram is that of the maximum unique value. 13. A database management system comprising: a memory device to store computer executable instructions; and a processing unit to access the memory device and execute the computer executable instructions, the computer executable instructions providing instructions to: create an initial counting bloom filter (CBF) instance having an array of counters and hash functions that map an inserted value to the array of counters; designate the initial CBF instance as a current CBF instance; sequentially insert each value of a sample data set of a table column into the hash functions of the current CBF instance and incrementing counters of the array of counters to which the value is mapped, wherein prior to inserting each value, when a number of counters of the array of counters having non-zero values is at least at a threshold level: designate the current CBF instance as an old CBF instance; create a new CBF instance having an array of counters and hash functions that map an inserted value to the array counters; and designate the new CBF instance as the current CBF instance. 14. The database management system of claim 13 , the computer executable instructions providing instructions to: wherein prior to inserting each value into the hash functions of the current CBF instance, when there is more than one old CBF instance to insert the value into the current CBF, and when there is only one old CBF instance: to insert the value into the hash functions of the old CBF instance to determine if the va

Assignees

Inventors

Classifications

  • G06F16/258Primary

    Data format conversion from or to a database · CPC title

  • Hash tables · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

  • Approximate or statistical queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10452676B2 cover?
A method of managing a database including creating an initial counting bloom filter (CBF) instance having an array of counters and hash functions that map an inserted value to the array of counters, and designating the initial CBF instance as a current CBF instance, and sequentially inserting each value of a sample data set of a table column into the hash functions of the current CBF instance a…
Who is the assignee on this patent?
Hewlett Packard Entpr Dev Lp
What technology area does this patent fall under?
Primary CPC classification G06F16/258. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).