What technology area does this patent fall under?

Primary CPC classification G06F16/221. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 03 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Adaptive dictionary compression/decompression for column-store databases

US10824596B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10824596-B2
Application number	US-201916255622-A
Country	US
Kind code	B2
Filing date	Jan 23, 2019
Priority date	Dec 23, 2013
Publication date	Nov 3, 2020
Grant date	Nov 3, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Innovations for adaptive compression and decompression for dictionaries of a column-store database can reduce the amount of memory used for columns of the database, allowing a system to keep column data in memory for more columns, while delays for access operations remain acceptable. For example, dictionary compression variants use different compression techniques and implementation options, Some dictionary compression variants provide more aggressive compression (reduced memory consumption) but result in slower run-time performance. Other dictionary compression variants provide less aggressive compression (higher memory consumption) but support faster run-time performance. As another example, a compression manager can automatically select a dictionary compression variant for a given column in a column-store database. For different dictionary compression variants, the compression manager predicts run-time performance and compressed dictionary size, given the values of the column, and selects one of the dictionary compression variants.

First claim

Opening claim text (preview).

We claim: 1. One or more non-transitory computer-readable media storing computer-executable instructions for causing a computing system, when programmed thereby, to perform operations comprising: evaluating at least some of multiple available compression variants to apply to a dictionary for a column of a table in a column-store database, wherein the dictionary maps distinct values among values of the column to value identifiers, and wherein the evaluating uses compression models for the respective at least some of the multiple available compression variants, a given compression model of the compression models estimating compressed dictionary size of the dictionary for a given compression variant of the multiple available compression variants without applying the given compression variant to the dictionary; selecting, based at least in part on results of the evaluating, one of the multiple available compression variants to apply to the dictionary; and applying the selected compression variant to the dictionary, thereby reducing the compressed dictionary size of the dictionary, including, for each of at least one of the distinct values of the dictionary, replacing at least part of the distinct value with one or more codes that represent the replaced at least part of the distinct value, the one or more codes being shorter than the replaced at least part of the distinct value. 2. The one or more computer-readable media of claim 1 , wherein, for domain encoding that uses the dictionary, the values of the column are replaced with corresponding value identifiers, the corresponding value identifiers being oriented as a column vector, and wherein the column-store database is an in-memory column-store database. 3. The one or more computer-readable media of claim 1 , wherein the multiple available compression variants include: a first compression variant that uses Huffman coding or Hu-Tucker coding in which the one or more codes include one or more Huffman codes; a second compression variant that uses front coding in which the one or more codes include one or more prefix lengths; a third compression variant that uses bit compression in which the one or more codes include one or more x-bit codes each representing a single character; a fourth compression variant that uses N-gram compression according to which N-tuples are replaced with x-bit codes, for N greater than or equal to 2, as the one or more codes, each of the one or more codes representing N characters; a fifth compression variant that uses Re-Pair compression in which the one or more codes include one or more x-bit codes each representing a combination of characters; and/or a sixth compression variant that uses column-wise bit compression in which the one or more codes include one or more x-bit codes each representing a single character of a column. 4. The one or more computer-readable media of claim 1 , wherein the multiple available compression variants include: a first compression variant that uses an array of string data and an array of pointers to locations in the array of string data, wherein the string data is compressed using one of Hu-Tucker coding, bit compression, N-gram compression or Re-Pair compression; a second compression variant that uses an array of fixed-length blocks; a third compression variant that uses one or more data structures for front coding; and/or a fourth compression variant that uses one or more data structures for bit-wise column compression. 5. The one or more computer-readable media of claim 1 , wherein the evaluating accounts for the compressed dictionary size and run-time performance, the run-time performance accounting for frequency of access of the column. 6. The one or more computer-readable media of claim 5 , wherein the selecting is also based at least in part on a tuning parameter that sets a preference between the compressed dictionary size and the run-time performance. 7. The one or more computer-readable media of claim 5 , wherein the frequency of access of the column quantifies an expected number of extract operations from the dictionary and/or an expected number of locate operations from the dictionary, and wherein the run-time performance also accounts for frequency of construction or updating of the dictionary. 8. The one or more computer-readable media of claim 1 , wherein the given compression model estimates the compressed dictionary size using only a subset of the values of the column. 9. The one or more computer-readable media of claim 1 , wherein the evaluating uses one or more of: characteristics of the respective compression variants, including, for the given compression variant, the given compression model and one or more run time values; characteristics of the column, including an expected number of extract operations until a next merge operation, an expected number of locate operations until the next merge operation, a size of a column vector for the column, a merge frequency, and the values of the column; and characteristics of the computing system for the database, including an amount of free physical memory and an amount of physical memory currently consumed by the database. 10. The one or more computer-readable media of claim 1 , wherein the selecting is also based at least in part on user input that indicates the selected compression variant. 11. In a computing system that implements a compression manager, a method comprising: with the computing system, evaluating at least some of multiple available compression variants to apply to a dictionary for a column of a table in a column-store database, wherein the evaluating includes estimating compressed dictionary size of the dictionary according to a compression model for a given compression variant of the multiple available compression variants without applying the given compression variant to the dictionary, and wherein the dictionary maps distinct values among values of the column to value identifiers; with the computing system, selecting, based at least in part on results of the evaluating, one of the multiple available compression variants to apply to the dictionary; and with the computing system, applying the selected compression variant to the dictionary, thereby reducing the compressed dictionary size of the dictionary, including, for each of at least one of the distinct values of the dictionary, replacing at least part of the distinct value with one or more codes that represent the replaced at least part of the distinct value, the one or more codes being shorter than the replaced at least part of the distinct value. 12. The method of claim 11 , wherein the estimating the compressed dictionary size of the dictionary uses only a subset of the values of the column. 13. The method of claim 11 , wherein the multiple available compression variants include: a first compression variant that uses an array of string data and an array of pointers to locations in the array of string data, wherein the string data is compressed using one of Hu-Tucker coding, bit compression, N-gram compression or Re-Pair compression; a second compression variant that uses an array of fixed-length blocks; a third compression variant that uses one or more data structures for front coding; and/or a fourth compression variant that uses one or more data structures for bit-wise column compression. 14. The method of claim 11 , wherein the evaluating accounts for the compressed dictionary size and run-time performance, the run-time performance accounting for frequency of access of the column. 15. The method of claim 11 , wherein the evaluating includes: determining

Assignees

Sap Se

Inventors

Classifications

G06F16/221Primary
Column-oriented storage; Management thereof · CPC title
G06F16/17Primary
Details of further file system functions · CPC title

Patent family

Related publications grouped by family.

View patent family 51862074

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10824596B2 cover?: Innovations for adaptive compression and decompression for dictionaries of a column-store database can reduce the amount of memory used for columns of the database, allowing a system to keep column data in memory for more columns, while delays for access operations remain acceptable. For example, dictionary compression variants use different compression techniques and implementation options, So…
Who is the assignee on this patent?: Sap Se
What technology area does this patent fall under?: Primary CPC classification G06F16/221. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 03 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Creation of a hierarchical dictionary

Paged column dictionary

Large string access and storage

Tables With Unlimited Number Of Sparse Columns And Techniques For An Efficient Implementation

Frequently asked questions