Table scan predicate with integrated semi-join filter
US-2024419650-A1 · Dec 19, 2024 · US
US9378231B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9378231-B2 |
| Application number | US-201113107399-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 13, 2011 |
| Priority date | Aug 27, 2007 |
| Publication date | Jun 28, 2016 |
| Grant date | Jun 28, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present invention provide one or more hardware-friendly data structures that enable efficient hardware acceleration of database operations. In particular, the present invention employs a column-store format for the database. In the database, column-groups are stored with implicit row ids (RIDs) and a RID-to-primary key column having both column-store and row-store benefits via column hopping and a heap structure for adding new data. Fixed-width column compression allow for easy hardware database processing directly on the compressed data. A global database virtual address space is utilized that allows for arithmetic derivation of any physical address of the data regardless of its location. A word compression dictionary with token compare and sort index is also provided to allow for efficient hardware-based searching of text. A tuple reconstruction process is provided as well that allows hardware to reconstruct a row by stitching together data from multiple column groups.
Opening claim text (preview).
What is claimed is: 1. A method of encoding data into a hardware-favorable form for a database, said method comprising: profiling columns of the data; compressing the columns of the data into column groups having one or more columns based on the profile of the columns of data; determining a fixed width for each of the column groups; and writing the column groups with the selected fixed width into a column-store database. 2. The method of claim 1 , wherein determining the fixed width comprises determining a fixed width that is a multiple of a machine word. 3. The method of claim 1 , wherein determining the fixed width comprises determining a fixed width that is a multiple of a machine word and minimizes an amount space required to store the column group. 4. The method of claim 1 , wherein profiling the columns of data comprises sampling columns of the data. 5. The method of claim 1 , wherein profiling the columns of data comprises determining whether data in the column comprises a range of floating point numbers. 6. The method of claim 1 , wherein profiling the columns of data comprises determining whether data in the column comprises a range of integers. 7. The method of claim 1 , wherein profiling the columns of data comprises determining whether data in the column comprises a finite set of tokens. 8. The method of claim 7 , wherein determining whether data in the column comprises determining a sequence of characters that are separated by a delimiter. 9. The method of claim 1 , wherein profiling the columns of data comprises determining whether data in the column comprises an enumerated range of values. 10. The method of claim 1 , wherein profiling the columns of data comprises determining whether data in the column comprises information relating to dates. 11. The method of claim 1 , wherein profiling the columns of data comprises determining whether data in the column comprises a range of telephone numbers. 12. The method of claim 1 , wherein profiling the columns of data comprises determining whether data in the column comprises a range of address information. 13. The method of claim 1 , wherein compressing the columns of the data into a column group comprises compressing the columns based on tokens. 14. The method of claim 1 , wherein compressing the columns of the data into a column group comprises compressing the columns based on enumeration. 15. The method of claim 1 , wherein compressing the columns of the data into a column group comprises compressing the columns based on a reduced alphabet. 16. The method of claim 1 , wherein compressing the columns of the data into a column group comprises compressing the columns based on a range of integer values. 17. The method of claim 1 , further comprising: determining a dictionary of tokens for data in the columns of data; and determining a sort index for the dictionary of tokens based on assigning the tokens a monotonically increasing sequence of numbers. 18. The method of claim 17 , further comprising determining a sorted array of tokens that define a sort order of the tokens. 19. A computer system for encoding data into a hardware-favorable form for a database, said computer system comprising: a computer storage device for storing the data in compressed column format; and a processor for: profiling columns of the data; compressing the columns of the data into column groups having one or more columns based on the profile of the columns of data; determining a fixed width for each of the column groups; and writing the column groups with the selected fixed width into a column-store database on said computer storage device. 20. A non-transitory, tangible computer readable medium comprising program code for performing a method of encoding data into a hardware-favorable form for a database, said computer readable medium comprising: program code executed by a processor for profiling columns of the data; program code executed by said processor for compressing the columns of the data into column groups having one or more columns based on the profile of the columns of data; program code executed by said processor for determining a fixed width for each of the column groups; and program code executed by said processor for writing the column groups with the selected fixed width into a column-store database on a computer storage device.
Column-oriented storage; Management thereof · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.