Workload-driven database reorganization
US-2022092049-A1 · Mar 24, 2022 · US
US12118402B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12118402-B2 |
| Application number | US-202318540004-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 14, 2023 |
| Priority date | Aug 5, 2020 |
| Publication date | Oct 15, 2024 |
| Grant date | Oct 15, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A record processing and storage system is operable to receive a set of records for storage. The set of records are included in a plurality of pages stored by a page storage system, and each page of the plurality of pages includes a plurality of records in the set of records. Key value-based record distribution data is generated for the set of records based on a plurality of cluster key values of the set of records. A cluster key domain spanned by the plurality of cluster key values is divided into a plurality of key space sub-intervals based on the key value-based record distribution data. The set of records are segregated into a plurality of row subsets corresponding to the plurality of key space sub-intervals. A plurality sets of segments are generated by processing the plurality of row subsets in parallel.
Opening claim text (preview).
What is claimed is: 1. A method for execution by a record processing and storage system, comprising: receiving a set of records for storage, wherein the set of records are included in a plurality of pages stored by a page storage system, and wherein each page of the plurality of pages includes a plurality of records in the set of records; generating key value-based record distribution data for the set of records based on a plurality of cluster key values of the set of records; dividing a cluster key domain spanned by the plurality of cluster key values into a plurality of key space sub-intervals based on the key value-based record distribution data; segregating the set of records into a plurality of row subsets corresponding to the plurality of key space sub-intervals; and generating a plurality sets of segments by processing the plurality of row subsets in parallel. 2. The method of claim 1 , further comprising: generating the plurality of pages; and determining to convert the plurality of pages into the plurality of records based on storage utilization data. 3. The method of claim 1 , wherein segregating the set of records into the plurality of row subsets is based on: accessing each of the plurality of pages; and extracting ones of the plurality of records in the each of the plurality of pages having cluster key values included in a corresponding one of the plurality of key space sub-intervals for inclusion in a corresponding one of the plurality of row subsets. 4. The method of claim 1 , wherein one plurality of records of one page of the plurality of pages includes: a first record having a first cluster key value included in a first one of the plurality of key space sub-intervals; and a second record having a second cluster key value included in a second one of the plurality of key space sub-intervals; wherein another plurality of records of another page of the plurality of pages includes: a third record having a third cluster key value included in the first one of the plurality of key space sub-intervals; and a fourth record having a fourth cluster key value included in the second one of the plurality of key space sub-intervals. 5. The method of claim 1 , wherein the plurality of sets of segments are generated from the set of records via a plurality of processing core resources, wherein each processing core resource in the plurality of processing core resources generates a subset of the plurality of sets of segments by: identifying, via each processing core resource, a corresponding row subset of the plurality of row subsets based on corresponding to a key space sub-interval of the plurality of key space sub-intervals assigned to the each processing core resource; and generating, via the each processing core resource, the subset of the plurality of sets of segments to include ones of the set of records included in the corresponding row subset. 6. The method of claim 5 , further comprising: determining a selected number of key space sub-intervals to be generated based on a number of processing core resources in the plurality of processing core resources: wherein the cluster key domain is segregated into the selected number of key space sub-intervals. 7. The method of claim 5 , further comprising: determining a target number of records to be included in each row subset of the plurality of row subsets based on at least one of: a total number of records in the set of records, or a selected number of key space sub-intervals to be generated: wherein the cluster key domain is segregated into number of key space sub-intervals selected based on the target number of records. 8. The method of claim 1 , wherein each the plurality of key space sub-intervals includes a corresponding one of a plurality of proper subsets of the plurality of cluster key values of the cluster key domain, wherein each of the plurality of proper subsets of the plurality of cluster key values are mutually exclusive and collectively exhaustive with respect to the plurality of cluster key values, and wherein each of the plurality of proper subsets of the plurality of cluster key values include sequential ones of the plurality of cluster key values in accordance with an ordering of the plurality of cluster key values. 9. The method of claim 8 , wherein a first proper subset of the plurality of proper subsets includes a first number of cluster key values, and wherein a second proper subset of the plurality of proper subsets includes a second number of cluster key values that is different from the first number of cluster key values. 10. The method of claim 1 , wherein generating the plurality of sets of segments from the set of records is based on accessing the set of records from storage in a row-based format, wherein each of the plurality of sets of segments are generated to include a corresponding subset of the set of records in a column-based format. 11. The method of claim 10 , wherein generating the plurality of sets of segments from the set of records is further based on, for each row subset of the plurality of row subsets: generating a plurality of record groups from the each row subset based on cluster key values of records included in the each row subset: generating a set of column-formatted record data for each of the plurality of record groups; and generating a set of segments from each set of column-formatted record data. 12. The method of claim 11 , wherein generating the set of segments from each set of column-formatted record data includes generating segment metadata for each set of segments. 13. The method of claim 11 , wherein generating the set of segments from each set of column-formatted record data includes applying a redundancy storage error coding scheme to each set of column-formatted record data to generate a corresponding set of segments. 14. The method of claim 1 , wherein the key value-based record distribution data is based on empirical data indicating a number of records in the set of records having each of the plurality of cluster key values. 15. The method of claim 1 , wherein dividing the cluster key domain spanned by the plurality of cluster key values into the plurality of key space sub-intervals is based on recursively splitting intervals of the cluster key domain into two intervals until a plurality of intervals that includes a target number of intervals are created, wherein each of the plurality of key space sub-intervals corresponds to one of the plurality of intervals. 16. The method of claim 15 , wherein a given interval is split into two corresponding intervals, wherein the two corresponding intervals includes a first corresponding interval and a second corresponding interval, wherein a first subset of records of the set of records have first corresponding cluster key values included in the first corresponding interval, wherein a second subset of records of the set of records have second corresponding cluster key values included in the second corresponding interval, and wherein splitting of the given interval into the two corresponding intervals is based on minimizing a difference between a first number of records included in the first subset of records and a second number of records included in the second subset of records. 17. The method of claim 15 , wherein a given interval is split into two corresponding intervals, wherein the two corresponding intervals includes a first corresponding interval and a second corresponding interval, wherein the first corresponding interval of the cluster key domain includes a first subset of cluster keys of
Trees · CPC title
Clustering; Classification · CPC title
Efficient disk access during query execution · CPC title
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
of parallel queries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.