Reducing data i/o using in-memory data structures
US-2017116136-A1 · Apr 27, 2017 · US
US11567939B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11567939-B2 |
| Application number | US-202217814110-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 21, 2022 |
| Priority date | Dec 26, 2019 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A pruning index is generated for a source table organized into a set of batch units. The source table comprises a column of semi-structured data. The pruning index comprises a set of filters that index distinct values in each column of the source table. Rather than reassembling an entire tree structure of the semi-structured data prior to indexing, the generating of the pruning index comprises traversing a reassembly hook object that represents a first portion of the semi-structured data that is subcolumnarized and traversing a residual object that represents a second portion of the semi-structured data that is not subcolumnarized. The reassembly hook object is traversed to identify values corresponding to the first portion of the semi-structured data and the residual object is traversed to identify values corresponding to the second portion. The pruning index is stored with an association with the source table.
Opening claim text (preview).
What is claimed is: 1. A system comprising: at least one hardware processor; and at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising: generating a pruning index for a source table organized into a set of batch units, the source table comprising a column of semi-structured data, the pruning index comprising a set of filters that index distinct values in each column of the source table, the generating of the pruning index comprising: accessing a reassembly hook object corresponding to a first portion of the semi-structured data that is subcolumnarized, the reassembly hook comprising a first data structure that represents the first portion of the semi-structured data; traversing the reassembly hook object to identify a first set of values corresponding to the first portion of the semi-structured data; accessing a residual object corresponding to a second portion of the semi-structured data that is not subcolumnarized, the residual object comprising a second data structure that represents at least a portion of the second portion of the semi-structured data; traversing the residual object to identify a second set of values corresponding to the second portion of the semi-structured data; and storing the pruning index with an association with the source table. 2. The system of claim 1 , wherein: the first portion of the semi-structured data comprises a subcolumnarized path; and the first data structure comprises a tree structure, the tree structure including a root node and a primitive node corresponding to the subcolumnarized path, the primitive node comprising a pointer to one or more primitive values for the subcolumnarized path. 3. The system of claim 1 , wherein the first portion of the semi-structured data that is subcolumnarized is stored in a columnar form within the column. 4. The system of claim 1 , wherein the generating of the pruning index further comprises generating a filter for the pruning index based on a value from the first set of values or the second set of values. 5. The system of claim 1 , wherein the generating of the pruning index further comprises generating one or more indexing transformations based on a value from the first set of values or the second set of values. 6. The system of claim 5 , wherein the generating of the pruning index further comprises: generating a set of fingerprints for the value based on the one or more indexing transformations; and populating a filter in the set of filters with the set of fingerprints. 7. The system of claim 6 , wherein the generating of the set of fingerprints for the value comprises generating a fingerprint for the value by computing a hash based on an indexing transformation for the value. 8. The system of claim 6 , wherein the generating of the one or more indexing transformations comprises converting the value to one or more stored data types. 9. The system of claim 8 , wherein the converting of the value to one or more stored data types comprises executing a cast function on the value. 10. The system of claim 5 , wherein generating the one or more indexing transformations comprises: attempting to convert the value to a stored data type; and in response to a failed attempt to convert the value, storing an indicator that values in the column are unable to be converted to the stored data type. 11. The system of claim 5 , wherein generating the one or more indexing transformations comprises: attempting to convert the value to a stored data type; and in response to a successful attempt, saving a result as an indexing transformation for the object. 12. The system of claim 6 , wherein the operations further comprise: receiving a query including a predicate directed at the column; generating one or more indexing transformations based on a value in the predicate; generating a set of search fingerprints based on the one or more indexing transformations; pruning the set of batch units to scan for data matching the predicate using the pruning index and the set of search fingerprints; and processing the query by scanning a subset of batch units resulting from pruning the set of batch units. 13. A method comprising: generating, by one or more hardware processors, a pruning index for a source table organized into a set of batch units, the source table comprising a column of semi-structured data, the pruning index comprising a set of filters that index distinct values in each column of the source table, the generating of the pruning index comprising: accessing a reassembly hook object corresponding to a first portion of the semi-structured data that is subcolumnarized, the reassembly hook comprising a first data structure that represents the first portion of the semi-structured data; traversing the reassembly hook object to identify a first set of values corresponding to the first portion of the semi-structured data; accessing a residual object corresponding to a second portion of the semi-structured data that is not subcolumnarized, the residual object comprising a second data structure that represents at least a portion of the second portion of the semi-structured data; traversing the residual object to identify a second set of values corresponding to the second portion of the semi-structured data; and storing the pruning index with an association with the source table. 14. The method of claim 13 , wherein: the first portion of the semi-structured data comprises a subcolumnarized path; and the first data structure comprises a tree structure, the tree structure including a root node and a primitive node corresponding to the subcolumnarized path, the primitive node comprising a pointer to one or more primitive values for the subcolumnarized path. 15. The method of claim 13 , wherein the first portion of the semi-structured data that is subcolumnarized is stored in a columnar form within the column. 16. The method of claim 13 , wherein the generating of the pruning index further comprises generating a filter for the pruning index based on a value from the first set of values or the second set of values. 17. The method of claim 13 , wherein the generating of the pruning index further comprises generating one or more indexing transformations based on a value from the first set of values or the second set of values. 18. The method of claim 17 , wherein the generating of the pruning index further comprises: generating a set of fingerprints for the value based on the one or more indexing transformations; and populating a filter in the set of filters with the set of fingerprints. 19. The method of claim 18 , wherein the generating of the set of fingerprints for the value comprises generating a fingerprint for the value by computing a hash based on an indexing transformation for the value. 20. The method of claim 19 , wherein the generating of the one or more indexing transformations comprises converting the value to one or more stored data types. 21. A computer-storage medium comprising instructions that, when executed by one or more processors of a machine, configure the machine to perform operations comprising: generating, by one or more hardware processors, a pruning index for a source table organized into a set of batch units, the source table comprising a column of semi-structured data, the pruning index comprising a set of filters that index distinct values in each column of the source table, the generating of the pruning index comprising
Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title
Efficient disk access during query execution · CPC title
Management thereof · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Filtering based on additional data, e.g. user or group profiles · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.