Compression of tables based on occurrence of values

US9852169B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9852169-B2
Application numberUS-201414275709-A
CountryUS
Kind codeB2
Filing dateMay 12, 2014
Priority dateMay 21, 2007
Publication dateDec 26, 2017
Grant dateDec 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector are stored to enable searches of the data represented by the number and the vector. The vector may omit a portion representing the group of adjacent rows. The values may be dictionary-based compression values representing business data such as business objects. The compression may be performed in-memory, in parallel, to improve memory utilization, network bandwidth consumption, and processing performance.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer program product, tangibly embodied in a computer-readable medium, the computer program product being operable to cause data processing apparatus to perform operations comprising: generating columns of dictionary-based compression values, the columns of dictionary-based compression values being based on a dictionary of possible values for each column of a column-based database; generating a bit vector for at least one of the columns, each of the bit vectors representing most-frequently occurring values of a respective column, the generating comprising having each bit of the bit vector represent whether the most-frequently occurring value exists in a respective row of the respective column; generating a number for each of the columns having an associated bit vector, the number representing an amount of occurrences of the most-frequently occurring value of one end of the column, wherein the most frequently occurring value comprises a plurality of bits; removing from each of the bit vectors a representation of the most-frequently occurring value of one end of the respective column based on the number associated with the bit vector; storing the number associated with each of the bit vectors to enable non-volatile memory searches of the data represented by the number associated with each of the bit vectors and each of the bit vectors; and generating a delta index separate from the columns that stores changes to at least one column occurring after the generation and storing of the compression values, wherein the stored changes comprises dictionary values being added to the delta index in chronological order to reflect an ordering of changes made to data in the columns over time. 2. The product of claim 1 , wherein the dictionary-based compression values are values representing structured data having data dependencies across a same row of a table. 3. The product of claim 2 , wherein the structured data comprises objects modeled as sets of joined tables. 4. The product of claim 1 , wherein the operations of the product are performed in parallel on a plurality of hardware servers. 5. The product of claim 4 , wherein the operations further comprise removing the most-frequently occurring values from columns corresponding to the one or more numbers to generate reduced columns and storing the reduced columns in lieu of the columns. 6. The product of claim 1 , wherein the bit vector is generated all of the columns. 7. The product of claim 1 , wherein the non-volatile memory searches comprises searching both the delta index and the data. 8. The product of claim 1 , further including sorting the columns such that a first column ordered first in an ordering of the columns has a most-frequently occurring value of the first column occurring more frequently than frequently occurring values of other columns. 9. A method comprising: generating columns of dictionary-based compression values, the columns of dictionary-based compression values being based on a dictionary of possible values for each column of a column-based database; generating, by one or more processors, a bit vector for at least one of the columns, each of the bit vectors representing most-frequently occurring values of a respective column, the generating comprising having each bit of the bit vector represent whether the most-frequently occurring value exists in a respective row of the respective column; generating, by one or more processors, a number for each of the columns having an associated bit vector, the number representing an amount of occurrences of the most-frequently occurring value of one end of the column, wherein the most frequently occurring value comprises a plurality of bits; removing, by one or more processors, from each of the bit vectors a representation of the most-frequently occurring value of one end of the respective column based on the number associated with the bit vector; storing, by one or more processors, the number associated with each of the bit vectors to enable non-volatile memory searches of the data represented by the number associated with each of the bit vectors and each of the bit vectors; and generating a delta index separate from the columns that stores changes to at least one column occurring after the generation and storing of the compression values, wherein the stored changes comprises dictionary values being added to the delta index in chronological order to reflect an ordering of changes made to data in the columns over time. 10. The method of claim 9 , wherein the dictionary-based compression values are values representing structured data having data dependencies across a same row of a table. 11. The method of claim 10 , wherein the structured data comprises objects modeled as sets of joined tables. 12. The method of claim 9 , wherein the operations of the product are performed in parallel on a plurality of hardware servers. 13. The method of claim 12 , wherein the operations further comprise removing the most-frequently occurring values from columns corresponding to the one or more numbers to generate reduced columns and storing the reduced columns in lieu of the columns. 14. The method of claim 9 , wherein the bit vector is generated for all of the columns. 15. The method of claim 9 , wherein the non-volatile memory searches comprises searching both the delta index and the data. 16. The method of claim 9 , further including sorting the columns such that a first column ordered first in an ordering of the columns has a most-frequently occurring value of the first column occurring more frequently than frequently occurring values of other columns. 17. A system comprising: at least one processor; and at least one memory including computer program code which when executed by the at least one processor causes operations comprising: generating columns of dictionary-based compression values, the columns of dictionary-based compression values being based on a dictionary of possible values for each column of a column-based database; generating a bit vector for at least one of the columns, each of the bit vectors representing most-frequently occurring values of a respective column, the generating comprising having each bit of the bit vector represent whether the most-frequently occurring value exists in a respective row of the respective column; generating a number for each of the columns having an associated bit vector, the number representing an amount of occurrences of the most-frequently occurring value of one end of the column, wherein the most frequently occurring value comprises a plurality of bits; removing from each of the bit vectors a representation of the most-frequently occurring value of one end of the respective column based on the number associated with the bit vector; storing the number associated with each of the bit vectors to enable non-volatile memory searches of the data represented by the number associated with each of the bit vectors and each of the bit vectors; and generating a delta index separate from the columns that stores changes to at least one column occurring after the generation and storing of the compression values, wherein the stored changes comprises dictionary being added to the delta index in chronological order to reflect an ordering of changes made to data in the columns over time. 18. The system of claim 7 , wherein the dictionary-based compression values are values representing structured data having data dependencies across a same row of a table. 19. The

Assignees

Inventors

Classifications

  • H03M7/30Primary

    Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • Data acquisition and logging (for input to computer G06F3/00) · CPC title

  • Query execution · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9852169B2 cover?
Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector …
Who is the assignee on this patent?
Faerber Franz, Radestock Guenter, Ross Andrew, and 1 more
What technology area does this patent fall under?
Primary CPC classification H03M7/30. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).