Technologies for dividing work across accelerator devices
US-2024143410-A1 · May 2, 2024 · US
US9325344B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9325344-B2 |
| Application number | US-201113206827-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 10, 2011 |
| Priority date | Dec 3, 2010 |
| Publication date | Apr 26, 2016 |
| Grant date | Apr 26, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Data stored in a column-oriented manner is encoded using a data mining algorithm for finding column patterns among a set of data tuples, where each data tuple contains a set of columns, and the data mining algorithm treats all columns and all column combinations and column ordering similarly or in the same manner when looking for column patterns. Column values are ordered occurring in the column patterns based on their frequencies into a prefix tree, where the prefix tree defines a pattern order. The data tuples are sorted according to the pattern order, resulting in sorted data tuples, and columns of the sorted data tuples are encoded using run-length encoding.
Opening claim text (preview).
What is claimed is: 1. A method for encoding data stored in a column-oriented manner, comprising: finding column patterns among a set of data tuples, wherein each data tuple contains a set of columns, and wherein the column patterns are identified based upon a single column individually, and upon a combination of columns together, by grouping each column together with one or more other columns, without regard for column order, and evaluating rows of the grouped columns to determine the column patterns; identifying single column values occurring in the column patterns, in order to reduce a number of nodes of a prefix tree comprising a plurality of nodes; ordering the single column values occurring in the column patterns based on their corresponding frequencies into the prefix tree, wherein the prefix tree defines a pattern order and is constructed to share the nodes corresponding to the single column values occurring more frequently; sorting the data tuples according to the pattern order, resulting in sorted data tuples; and encoding columns of the sorted data tuples using run-length encoding. 2. The method according to claim 1 , further comprising: dividing the sorted data tuples into smaller already precompressed parts; and recursively using the smaller already precompressed parts to find the column patterns among the sorted data tuples. 3. The method according to claim 1 , further comprising using only values of different columns in each data tuple. 4. The method according to claim 1 , further comprising using a limited number of columns of a single table as the set of columns. 5. The method according to claim 4 , further comprising interpreting the single table as a transactional database with the data tuples as transactions and pairs of column ID and column value as items. 6. The method according to claim 5 , further comprising: generating an output list containing at least one of all column value combinations and all closed column value combinations under a minimum support threshold, which is dynamically adaptable to reduce a size of the output list; and bounding a maximum length of each column value combination by the number of columns used in each data tuple. 7. The method according to claim 4 , further comprising using only columns of the single table having certain characteristics promising a high compression ratio for building the data tuples. 8. The method according to claim 7 , further comprising: estimating cardinality for columns of the single table using an incremental cardinality algorithm; and using the cardinality as a characteristic promising a high compression ratio. 9. The method according to claim 1 , further comprising: assigning a pair of column ID and column value to each node of the prefix tree, each node having an ID field which is unique within the prefix tree and two counter fields; and linking the nodes together by at least one of pattern links, column links or rank links. 10. The method according to claim 1 , further comprising modifying paths of the prefix tree of the column patterns by weighting or swapping the nodes of the prefix tree to optimize the prefix tree. 11. A data processing system for encoding data stored in a column-oriented manner, comprising: a processor configured to: find column patterns among a set of data tuples, wherein each data tuple contains a set of columns, and wherein the column patterns are identified based upon a single column individually, and upon a combination of columns together, by grouping each column together with one or more other columns, without regard for column order, and evaluating rows of the grouped columns to determine the column patterns; identify single column values occurring in the column patterns, in order to reduce a number of nodes of a prefix tree comprising a plurality of nodes; order the single column values occurring in the column patterns based on their corresponding frequencies into the prefix tree, wherein the prefix tree defines a pattern order and is constructed to share the nodes corresponding to the single column values occurring more frequently; sort the data tuples according to the pattern order, resulting in sorted data tuples; and encode columns of the sorted data tuples using run-length encoding. 12. The data processing system according to claim 11 , wherein the sorting of the data tuples by the processor includes dividing the sorted data tuples into smaller already precompressed parts, and performing a recursive run for finding the column patterns among the sorted data tuples. 13. A computer program product comprising: a non-transitory computer readable storage medium having a computer readable program code embodied therewith, the computer readable program code configured to perform operations of: finding column patterns among a set of data tuples, wherein each data tuple contains a set of columns, and wherein the column patterns are identified based upon a single column individually, and upon a combination of columns together, by grouping each column together with one or more other columns, without regard for column order, and evaluating rows of the grouped columns to determine the column patterns; identifying single column values occurring in the column patterns, in order to reduce a number of nodes of a prefix tree comprising a plurality of nodes; ordering the single column values occurring in the column patterns based on their corresponding frequencies into the prefix tree, wherein the prefix tree defines a pattern order and is constructed to share the nodes corresponding to the single column values occurring more frequently; sorting the data tuples according to the pattern order, resulting in sorted data tuples; and encoding columns of the sorted data tuples using run-length encoding. 14. The computer program product of claim 13 , where the operations further comprise: dividing the sorted data tuples into smaller already precompressed parts; and recursively using the smaller already precompressed parts to find the column patterns among the sorted data tuples. 15. The computer program product of claim 13 , wherein the operations use only values of different columns in each data tuple. 16. The computer program product of claim 13 , wherein the operations use a limited number of columns of a single table as the set of columns. 17. The computer program product of claim 16 , the operations further comprising interpreting the single table as a transactional database with the data tuples as transactions and pairs of column ID and column value as items. 18. The computer program product of claim 17 , wherein the operations further comprise: generating an output list containing at least one of all column value combinations and all closed column value combinations under a minimum support threshold, which is dynamically adaptable to reduce a size of the output list; and bounding a maximum length of each column value combination by the number of columns used in each data tuple. 19. The computer program product of claim 16 , the operations further comprising using only columns of the single table having certain characteristics promising a high compression ratio for building the data tuples. 20. The computer program product of claim 19 , the operations further comprising: estimating cardinality for columns of the single table using an incremental cardinality algorithm; and using the cardinality as a characteristic promising a high compression ratio. 21. The comput
Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code · CPC title
Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title
Physics · mapped topic
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.