Extreme value computation

US10915533B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10915533-B2
Application numberUS-201916276790-A
CountryUS
Kind codeB2
Filing dateFeb 15, 2019
Priority dateMay 23, 2016
Publication dateFeb 9, 2021
Grant dateFeb 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The method may include providing a plurality of synopsis techniques for determining a plurality of attribute value information indicative of the at least one attribute. The method may include determining a data characteristic describing the plurality of data rows of the current data block. The method may include selecting, based on the determined data characteristic, at least one synopsis technique of the provided plurality of synopsis techniques suitable for generating the plurality of attribute value information for the at least one attribute of the current data block. The method may include determining the plurality of attribute value information for the at least one attribute of the plurality of data rows of the current data block using the at least one selected synopsis technique. The method may include storing the determined plurality of attribute value information for the current data block to be used for query processing against the data table.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for storing a data table, wherein the data table has at least one attribute, and wherein a first data block of the data table comprises a set of data rows of the data table, and wherein a current data block being the first data block, the method comprising: selecting, based on a data characteristic described in data rows of a current data block, at least one synopsis technique suitable for generating a plurality of attribute value information for at least one attribute of the current data block; determining the plurality of attribute value information for the at least one attribute of the plurality of data rows of the current data block using the at least one selected synopsis technique, wherein determining the plurality of attribute value information further comprises: scanning the current data block to identify a fixed number j from a plurality of first distinct values of the at least one attribute for indicating a value range of the at least one attribute in the current data block; dividing the value range into a plurality of sub-ranges; and creating a distribution of a plurality of buckets, wherein each bucket is associated with a respective sub-range of the plurality of sub-ranges; storing the determined plurality of attribute value information for the current data block to be used for query processing against the data table; determining a bucket within the plurality of buckets is available for each further current value j+1 of the at least one attribute; marking the bucket as a non-empty bucket; providing the plurality of attribute value information as comprising a plurality of extremum values of the at least one attribute in the current data block, the distribution of the plurality of buckets, and the value range in each of the distribution of the plurality of buckets; comparing a determined plurality of empty buckets within the distribution of the plurality of buckets with a predetermined maximum number; and using one of the distribution of the plurality of buckets and the plurality of extremum values to decide to scan or not to scan the current data block when evaluating a query based on the comparing of the determined plurality of empty buckets. 2. The method of claim 1 , wherein the data characteristic comprises a plurality of distinct values of the at least one attribute, and wherein selecting the at least one synopsis technique further comprises: comparing the plurality of distinct values with a predetermined threshold; and selecting the at least one synopsis technique based on a result of comparing the plurality of distinct values with the predetermined threshold. 3. The method of claim 2 , further comprising: in response to determining that the plurality of distinct values is smaller than the predetermined threshold, selecting an in-list technique of the provided plurality of synopsis techniques, wherein determining the plurality of attribute value information comprises providing the plurality of attribute value information as the plurality of distinct values of the at least one attribute. 4. The method of claim 3 , wherein the plurality of attribute value information is stored in a memory, and wherein the predetermined threshold comprises a maximum number of a plurality of memory units. 5. The method of claim 2 , further comprising: in response to determining that the plurality of distinct values is higher than the predetermined threshold, selecting a bloom filter technique within the provided plurality of synopsis techniques, wherein determining the plurality of attribute value information comprises inserting each value of the at least one attribute of the current data block into a bit of a bit vector, and wherein the plurality of attribute value information comprises at least the bit vector. 6. The method of claim 5 , wherein the plurality of attribute value information further comprises a plurality of extremum values of the at least one attribute. 7. The method of claim 1 , wherein the plurality of synopsis techniques comprising a default synopsis technique for providing the plurality of attribute value information as the plurality of extremum values of the at least one attribute of the plurality of data rows of the current data block, and wherein the method further comprises: assigning to each synopsis technique of the plurality of synopsis techniques, other than the default synopsis technique, a selection criterion to be fulfilled by a plurality of values of the at least one attribute in order to use each synopsis technique, wherein selecting the at least one synopsis technique comprises, in response to determining the selection criterion is not fulfilled, selecting the default synopsis technique. 8. The method of claim 1 , further comprising: assigning a tag to each technique of the plurality of synopsis techniques; storing the assigned tag of the at least one selected synopsis technique in association with the plurality of attribute value information; receiving a query on the at least one attribute of the data table; reading the stored tag to interpret the plurality of attribute value information in accordance with the at least one selected synopsis technique; and using the plurality of attribute value information for deciding to scan or not scan the current data block for evaluating the received query. 9. The method of claim 1 , wherein a second plurality of data blocks of the data table comprise a respective second plurality of data rows of the data table, the method further comprising: repeating, for each of the second plurality of data blocks, at least one of providing the plurality of synopsis techniques, determine the data characteristic, selecting at least one synopsis technique, determining a plurality of attribute value information, and storing the determined plurality of attribute information; grouping a resulting plurality of attribute value information of the first and second data blocks based on a respective at least one synopsis technique; assigning to each group a tag indicating the at least one synopsis technique used for the group; receiving a query of the at least one attribute of the data table; and using the plurality of attribute value information group-by-group by reading the tag corresponding to each group for interpreting the plurality of attribute value information of the group in accordance with the at least one synopsis technique of the group to determine a scan list of the plurality of data blocks to be scanned for evaluating the query. 10. The method of claim 1 , wherein the method is executed on a hardware component of a computer system, and wherein the hardware component comprises a field-programmable gate array. 11. The method of claim 1 , wherein determining the data characteristic is performed using a plurality of metadata descriptive of an overall structure of the data table. 12. The method of claim 1 , wherein the plurality of synopsis techniques comprises an in-list technique, a Bloom filter technique, and a default technique; and wherein providing the plurality of synopsis techniques, determine the data characteristic, selecting at least one synopsis technique, determining a plurality of attribute value information, and storing the determined plurality of attribute information are performed while scanning, row-by-row, the current data block, and further comprising: assigning a counter to the at least one attribute; for a current scanned row: inserting a value of the at least one attribute of a current row into a bit of a bit vector; updating the determined data characteristic, wherein the determined data characteristic comprises th

Assignees

Inventors

Classifications

  • Intermediate data storage techniques for performance improvement · CPC title

  • Query languages · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10915533B2 cover?
The method may include providing a plurality of synopsis techniques for determining a plurality of attribute value information indicative of the at least one attribute. The method may include determining a data characteristic describing the plurality of data rows of the current data block. The method may include selecting, based on the determined data characteristic, at least one synopsis techn…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/24561. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).