Parallel load in a column-store database

US9946742B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9946742-B2
Application numberUS-201615080697-A
CountryUS
Kind codeB2
Filing dateMar 25, 2016
Priority dateJan 30, 2014
Publication dateApr 17, 2018
Grant dateApr 17, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes adding, by a computer processor, two or more compressed columns to one or more pages of a database. The adding is performed in parallel by a plurality of page-formatter threads. Each page-formatter thread adds data to the database from no more than a single compressed column.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: reading, by a data-reader thread of a hardware device, input data in row-major format comprising values of a plurality of columns per row of a plurality of rows streamed sequentially; generating a plurality of histograms, comprising a respective histogram representing each column of the plurality of columns; granting, to each histogram-builder thread of a plurality of histogram-builder threads of the hardware device, exclusive access to a corresponding histogram of the plurality of histograms; compressing each column of the plurality of columns based at least in part on the respective histogram representing the column; adding, by the hardware device, two or more compressed columns of the plurality of columns to one or more pages of a database; wherein the adding is performed in parallel by a plurality of page-formatter threads of the hardware device; and wherein each page-formatter thread of the hardware device adds data to the database from no more than a single compressed column of the plurality of columns. 2. The method of claim 1 , further comprising dynamically increasing the quantity of the page-formatter threads. 3. The method of claim 1 , wherein a first table of the database comprises two or more insert ranges, the method further comprising: assigning a first page-formatter thread to a first insert range of the first table; and assigning a second page-formatter thread to a second insert range of the first table; wherein the first page-formatter thread and the second page-formatter thread write to the first table in parallel. 4. The method of claim 1 , further comprising: building a plurality of sets of data value frequencies based on a plurality of columns of input data, wherein the building is performed by a plurality of builder threads, and wherein each of the plurality of builder threads is assigned to a corresponding one of the columns at a given time and is configured to build a corresponding set of data value frequencies for the corresponding column; creating a plurality of compression dictionaries, comprising a corresponding compression dictionary for each of the plurality of columns; and generating the two or more compressed columns for the database by compressing each of the columns according to the corresponding compression dictionary. 5. The method of claim 4 , wherein the creating is performed by the plurality of builder threads. 6. The method of claim 4 , wherein two or more builder threads process a first column of the input data, the method further comprising: applying a hash partitioning to divide data of the first column into two or more portions; applying the hash partitioning to the corresponding set of data value frequencies of the first column to divide the corresponding set of data value frequencies into two or more portions; assigning a first builder thread to process the first portion of the first column; and assigning a second builder thread to process the second portion of the first column, wherein the second builder thread is distinct from the first builder thread. 7. The method of claim 4 , further comprising: selecting the input data from a larger set of potential input data, wherein the size of the selected input data as compared to the larger set of potential input data is based on a sampling ratio; wherein the sampling ratio is dynamically modifiable based on operations of the plurality of builder threads. 8. A system, comprising: a memory having computer readable instructions; and a hardware device configured to execute the computer readable instructions, the instructions comprising: reading, by a data-reader thread of the hardware device, input data in row-major format comprising values of a plurality of columns per row of a plurality of rows streamed sequentially; generating a plurality of histograms, comprising a respective histogram representing each column of the plurality of columns; granting, to each histogram-builder thread of a plurality of histogram-builder threads of the hardware device, exclusive access to a corresponding histogram of the plurality of histograms; compressing each column of the plurality of columns based at least in part on the respective histogram representing the column; adding two or more compressed columns of the plurality of columns to one or more pages of a database; wherein the adding is performed in parallel by a plurality of page-formatter threads of the hardware device; and wherein each page-formatter thread of the hardware device adds data to the database from no more than a single compressed column of the plurality of columns. 9. The system of claim 8 , the instructions further comprising dynamically increasing the quantity of the page-formatter threads. 10. The system of claim 8 , wherein a first table of the database comprises two or more insert ranges, the instructions further comprising: assigning a first page-formatter thread to a first insert range of the first table; and assigning a second page-formatter thread to a second insert range of the first table; wherein the first page-formatter thread and the second page-formatter thread write to the first table in parallel. 11. The system of claim 8 , the instructions further comprising: building a plurality of sets of data value frequencies based on a plurality of columns of input data, wherein the building is performed by a plurality of builder threads, and wherein each of the plurality of builder threads is assigned to a corresponding one of the columns at a given time and is configured to build a corresponding set of data value frequencies for the corresponding column; creating a plurality of compression dictionaries, comprising a corresponding compression dictionary for each of the plurality of columns; and generating the two or more compressed columns for the database by compressing each of the columns according to the corresponding compression dictionary. 12. The system of claim 11 , wherein the creating is performed by the plurality of builder threads. 13. The system of claim 11 , wherein two or more builder threads process a first column of the input data, the instructions further comprising: applying a hash partitioning to divide data of the first column into two or more portions; applying the hash partitioning to the corresponding set of data value frequencies of the first column to divide the corresponding set of data value frequencies into two or more portions; assigning a first builder thread to process the first portion of the first column; and assigning a second builder thread to process the second portion of the first column, wherein the second builder thread is distinct from the first builder thread. 14. The system of claim 11 , the instructions further comprising: selecting the input data from a larger set of potential input data, wherein the size of the input data as compared to the larger set of potential input data is based on a sampling ratio; wherein the sampling ratio is dynamically modifiable based on operations of the plurality of builder threads. 15. A computer program product comprising a non-transitory computer readable storage medium having computer readable program code embodied thereon, the computer readable program code executable by a hardware device to perform a method comprising: reading, by a data-reader thread of the hardware device, input data in row-major format comprising values of a plurality of columns per row of a plurality of rows streamed sequentially; generating a plurality of histograms, comprising a respective histogram representin

Assignees

Inventors

Classifications

  • Hash tables · CPC title

  • General implementation details not specific to a particular type of compression · CPC title

  • Intermediate data storage techniques for performance improvement · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • G06F16/221Primary

    Column-oriented storage; Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9946742B2 cover?
In one embodiment, a method includes adding, by a computer processor, two or more compressed columns to one or more pages of a database. The adding is performed in parallel by a plurality of page-formatter threads. Each page-formatter thread adds data to the database from no more than a single compressed column.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/221. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 17 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).