Columnar database compression

US11036684B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11036684-B2
Application numberUS-201816203650-A
CountryUS
Kind codeB2
Filing dateNov 29, 2018
Priority dateNov 16, 2015
Publication dateJun 15, 2021
Grant dateJun 15, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is an approach comprising a column partitioned into a plurality of partitions including an empty partition and a plurality of filled partitions each comprising data entries associated with a set of parameters having parameter values, the data entries compressed in accordance with a compression dictionary. The approach comprises receiving forecasted parameter values for an expected set of data entries to be stored in an empty partition; predicting a recurrence frequency of the data entries in the expected set using the forecasted parameter values by evaluating the respective compression dictionaries of the filled partitions with a machine learning algorithm; generating a predictive compression dictionary for the expected set of data entries based on the predicted recurrence frequency of the data entries in the expected set; receiving the expected set of data entries; and compressing at least part of the received expected set of data entries using the predictive compression dictionary.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: providing a columnar database comprising at least one column partitioned into a plurality of partitions including at least one empty partition and a plurality of filled partitions each comprising data entries associated with a set of parameters, the data entries compressed in accordance with a compression dictionary based on respective recurrence frequencies of the data entries in the filled partition; receiving forecasted parameter values for a set of parameters, having parameter values relevant to a recurrence frequency of a data entry in a partition, for an expected set of data entries to be stored in an empty partition of the column; predicting a recurrence frequency of the data entries in the expected set using the forecasted parameter values by evaluating data entry ranking histories associated with the respective compression dictionaries of the filled partitions with a machine learning algorithm; generating a predictive compression dictionary for the expected set of data entries based on the predicted recurrence frequency of the data entries in the expected set; receiving the expected set of data entries; compressing a defined fraction of the received expected set of data entries using the predictive compression dictionary; calculating a compression ratio for the compressed defined fraction of the received expected set of data entries; and comparing the compression ration with the target value and, responsive to a difference between the target value and the compression ratio being within a defined range: compressing the received expected set of data entries using the predictive compression dictionary; and storing the compressed received expected set of data entries in the empty partition. 2. The computer-implemented method of claim 1 , in which a parameter value of each parameter associated with a data entry is stored in a separate column of the columnar database. 3. The computer-implemented method of claim 1 , further comprising, if a difference between the target value and the compression ratio is outside the defined range: determining respective recurrence frequencies of the data entries in the defined fraction of the received expected set; generating an actual compression dictionary for the defined fraction of the received expected set based on the determined respective recurrence frequencies of the data entries in the defined fraction of the received expected set; augmenting the predictive compression dictionary for the expected set of data entries based on an evaluation of the actual compression dictionary; compressing the defined fraction of the received expected set of data entries using the augmented predictive compression dictionary; calculating a further compression ratio for the defined fraction of the received expected set of data entries compressed using the augmented predictive compression dictionary; comparing the further compression ratio with the target value; and, if a difference between the target value and the further compression ratio is within the defined range: compressing the received expected set of data entries using the augmented predictive compression dictionary; and storing the compressed received expected set of data entries in the empty partition. 4. The computer-implemented method of claim 1 , further comprising locking the columnar database during storing the compressed received expected set of data entries in the empty partition. 5. The computer-implemented method of claim 1 , in which the set of parameters includes at least one of meteorological parameters, economic parameters and temporal parameters. 6. A computer program product comprising: a computer readable storage medium having computer readable program instructions embodied therewith to: provide a columnar database comprising at least one column partitioned into a plurality of partitions including at least one empty partition and a plurality of filled partitions each comprising data entries associated with a set of parameters, the data entries compressed in accordance with a compression dictionary based on respective recurrence frequencies of the data entries in the filled partition; receive forecasted parameter values for a set of parameters, having parameter values relevant to a recurrence frequency of a data entry in a partition, for an expected set of data entries to be stored in an empty partition of the column; predict a recurrence frequency of the data entries in the expected set using the forecasted parameter values by evaluating data entry ranking histories associated with the respective compression dictionaries of the filled partitions with a machine learning algorithm; generate a predictive compression dictionary for the expected set of data entries based on the predicted recurrence frequency of the data entries in the expected set; receive the expected set of data entries; compress a defined fraction of the received expected set of data entries using the predictive compression dictionary; calculate a compression ratio for the compressed defined fraction of the received expected set of data entries; and compare the compression ration with the target value and, responsive to a difference between the target value and the compression ratio being within a defined range: compress the received expected set of data entries using the predictive compression dictionary; and store the compressed received expected set of data entries in the empty partition. 7. The computer program product of claim 6 , in which the computer readable program instructions further cause the processor arrangement to, if a difference between the target value and the compression ratio is outside the defined range: determine respective recurrence frequencies of the data entries in the defined fraction of the received expected set; generate an actual compression dictionary for the defined fraction of the received expected set based on the determined respective recurrence frequencies of the data entries in the defined fraction of the received expected set; augment the predictive compression dictionary for the expected set of data entries based on an evaluation of the actual compression dictionary; compress the defined fraction of the received expected set of data entries using the augmented predictive compression dictionary; calculate a further compression ratio for the defined fraction of the received expected set of data entries compressed using the augmented predictive compression dictionary; compare the further compression ratio with the target value; and, if a difference between the target value and the further compression ratio is within a defined range: compress the received expected set of data entries using the augmented predictive compression dictionary; and store the compressed received expected set of data entries in the empty partition. 8. The computer program product of claim 6 , in which the computer readable program instructions further cause the processor arrangement to lock the columnar database during storing the compressed received expected set of data entries in the empty partition. 9. A computer system comprising: a processor arrangement, the processor arrangement being adapted to: provide a columnar database comprising at least one column partitioned into a plurality of partitions including at least one empty partition and a plurality of filled partitions each comprising data entries associated with a set of parameters, the data entries compressed in accordance with a compression dictionary based on respective recurrence frequencies of the data entries in the filled partition; receive forecasted parameter values for

Assignees

Inventors

Classifications

  • using compression, e.g. sparse files · CPC title

  • Column-oriented storage; Management thereof · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • using ranking · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11036684B2 cover?
Disclosed is an approach comprising a column partitioned into a plurality of partitions including an empty partition and a plurality of filled partitions each comprising data entries associated with a set of parameters having parameter values, the data entries compressed in accordance with a compression dictionary. The approach comprises receiving forecasted parameter values for an expected set…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/1744. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 15 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).