Table scan predicate with integrated semi-join filter
US-2024419650-A1 · Dec 19, 2024 · US
US9946742B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9946742-B2 |
| Application number | US-201615080697-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 25, 2016 |
| Priority date | Jan 30, 2014 |
| Publication date | Apr 17, 2018 |
| Grant date | Apr 17, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a method includes adding, by a computer processor, two or more compressed columns to one or more pages of a database. The adding is performed in parallel by a plurality of page-formatter threads. Each page-formatter thread adds data to the database from no more than a single compressed column.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: reading, by a data-reader thread of a hardware device, input data in row-major format comprising values of a plurality of columns per row of a plurality of rows streamed sequentially; generating a plurality of histograms, comprising a respective histogram representing each column of the plurality of columns; granting, to each histogram-builder thread of a plurality of histogram-builder threads of the hardware device, exclusive access to a corresponding histogram of the plurality of histograms; compressing each column of the plurality of columns based at least in part on the respective histogram representing the column; adding, by the hardware device, two or more compressed columns of the plurality of columns to one or more pages of a database; wherein the adding is performed in parallel by a plurality of page-formatter threads of the hardware device; and wherein each page-formatter thread of the hardware device adds data to the database from no more than a single compressed column of the plurality of columns. 2. The method of claim 1 , further comprising dynamically increasing the quantity of the page-formatter threads. 3. The method of claim 1 , wherein a first table of the database comprises two or more insert ranges, the method further comprising: assigning a first page-formatter thread to a first insert range of the first table; and assigning a second page-formatter thread to a second insert range of the first table; wherein the first page-formatter thread and the second page-formatter thread write to the first table in parallel. 4. The method of claim 1 , further comprising: building a plurality of sets of data value frequencies based on a plurality of columns of input data, wherein the building is performed by a plurality of builder threads, and wherein each of the plurality of builder threads is assigned to a corresponding one of the columns at a given time and is configured to build a corresponding set of data value frequencies for the corresponding column; creating a plurality of compression dictionaries, comprising a corresponding compression dictionary for each of the plurality of columns; and generating the two or more compressed columns for the database by compressing each of the columns according to the corresponding compression dictionary. 5. The method of claim 4 , wherein the creating is performed by the plurality of builder threads. 6. The method of claim 4 , wherein two or more builder threads process a first column of the input data, the method further comprising: applying a hash partitioning to divide data of the first column into two or more portions; applying the hash partitioning to the corresponding set of data value frequencies of the first column to divide the corresponding set of data value frequencies into two or more portions; assigning a first builder thread to process the first portion of the first column; and assigning a second builder thread to process the second portion of the first column, wherein the second builder thread is distinct from the first builder thread. 7. The method of claim 4 , further comprising: selecting the input data from a larger set of potential input data, wherein the size of the selected input data as compared to the larger set of potential input data is based on a sampling ratio; wherein the sampling ratio is dynamically modifiable based on operations of the plurality of builder threads. 8. A system, comprising: a memory having computer readable instructions; and a hardware device configured to execute the computer readable instructions, the instructions comprising: reading, by a data-reader thread of the hardware device, input data in row-major format comprising values of a plurality of columns per row of a plurality of rows streamed sequentially; generating a plurality of histograms, comprising a respective histogram representing each column of the plurality of columns; granting, to each histogram-builder thread of a plurality of histogram-builder threads of the hardware device, exclusive access to a corresponding histogram of the plurality of histograms; compressing each column of the plurality of columns based at least in part on the respective histogram representing the column; adding two or more compressed columns of the plurality of columns to one or more pages of a database; wherein the adding is performed in parallel by a plurality of page-formatter threads of the hardware device; and wherein each page-formatter thread of the hardware device adds data to the database from no more than a single compressed column of the plurality of columns. 9. The system of claim 8 , the instructions further comprising dynamically increasing the quantity of the page-formatter threads. 10. The system of claim 8 , wherein a first table of the database comprises two or more insert ranges, the instructions further comprising: assigning a first page-formatter thread to a first insert range of the first table; and assigning a second page-formatter thread to a second insert range of the first table; wherein the first page-formatter thread and the second page-formatter thread write to the first table in parallel. 11. The system of claim 8 , the instructions further comprising: building a plurality of sets of data value frequencies based on a plurality of columns of input data, wherein the building is performed by a plurality of builder threads, and wherein each of the plurality of builder threads is assigned to a corresponding one of the columns at a given time and is configured to build a corresponding set of data value frequencies for the corresponding column; creating a plurality of compression dictionaries, comprising a corresponding compression dictionary for each of the plurality of columns; and generating the two or more compressed columns for the database by compressing each of the columns according to the corresponding compression dictionary. 12. The system of claim 11 , wherein the creating is performed by the plurality of builder threads. 13. The system of claim 11 , wherein two or more builder threads process a first column of the input data, the instructions further comprising: applying a hash partitioning to divide data of the first column into two or more portions; applying the hash partitioning to the corresponding set of data value frequencies of the first column to divide the corresponding set of data value frequencies into two or more portions; assigning a first builder thread to process the first portion of the first column; and assigning a second builder thread to process the second portion of the first column, wherein the second builder thread is distinct from the first builder thread. 14. The system of claim 11 , the instructions further comprising: selecting the input data from a larger set of potential input data, wherein the size of the input data as compared to the larger set of potential input data is based on a sampling ratio; wherein the sampling ratio is dynamically modifiable based on operations of the plurality of builder threads. 15. A computer program product comprising a non-transitory computer readable storage medium having computer readable program code embodied thereon, the computer readable program code executable by a hardware device to perform a method comprising: reading, by a data-reader thread of the hardware device, input data in row-major format comprising values of a plurality of columns per row of a plurality of rows streamed sequentially; generating a plurality of histograms, comprising a respective histogram representin
Hash tables · CPC title
General implementation details not specific to a particular type of compression · CPC title
Intermediate data storage techniques for performance improvement · CPC title
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Column-oriented storage; Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.