Utilizing key value-based record distribution data to perform parallelized segment generation in a database system

US12118402B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12118402-B2
Application numberUS-202318540004-A
CountryUS
Kind codeB2
Filing dateDec 14, 2023
Priority dateAug 5, 2020
Publication dateOct 15, 2024
Grant dateOct 15, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A record processing and storage system is operable to receive a set of records for storage. The set of records are included in a plurality of pages stored by a page storage system, and each page of the plurality of pages includes a plurality of records in the set of records. Key value-based record distribution data is generated for the set of records based on a plurality of cluster key values of the set of records. A cluster key domain spanned by the plurality of cluster key values is divided into a plurality of key space sub-intervals based on the key value-based record distribution data. The set of records are segregated into a plurality of row subsets corresponding to the plurality of key space sub-intervals. A plurality sets of segments are generated by processing the plurality of row subsets in parallel.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for execution by a record processing and storage system, comprising: receiving a set of records for storage, wherein the set of records are included in a plurality of pages stored by a page storage system, and wherein each page of the plurality of pages includes a plurality of records in the set of records; generating key value-based record distribution data for the set of records based on a plurality of cluster key values of the set of records; dividing a cluster key domain spanned by the plurality of cluster key values into a plurality of key space sub-intervals based on the key value-based record distribution data; segregating the set of records into a plurality of row subsets corresponding to the plurality of key space sub-intervals; and generating a plurality sets of segments by processing the plurality of row subsets in parallel. 2. The method of claim 1 , further comprising: generating the plurality of pages; and determining to convert the plurality of pages into the plurality of records based on storage utilization data. 3. The method of claim 1 , wherein segregating the set of records into the plurality of row subsets is based on: accessing each of the plurality of pages; and extracting ones of the plurality of records in the each of the plurality of pages having cluster key values included in a corresponding one of the plurality of key space sub-intervals for inclusion in a corresponding one of the plurality of row subsets. 4. The method of claim 1 , wherein one plurality of records of one page of the plurality of pages includes: a first record having a first cluster key value included in a first one of the plurality of key space sub-intervals; and a second record having a second cluster key value included in a second one of the plurality of key space sub-intervals; wherein another plurality of records of another page of the plurality of pages includes: a third record having a third cluster key value included in the first one of the plurality of key space sub-intervals; and a fourth record having a fourth cluster key value included in the second one of the plurality of key space sub-intervals. 5. The method of claim 1 , wherein the plurality of sets of segments are generated from the set of records via a plurality of processing core resources, wherein each processing core resource in the plurality of processing core resources generates a subset of the plurality of sets of segments by: identifying, via each processing core resource, a corresponding row subset of the plurality of row subsets based on corresponding to a key space sub-interval of the plurality of key space sub-intervals assigned to the each processing core resource; and generating, via the each processing core resource, the subset of the plurality of sets of segments to include ones of the set of records included in the corresponding row subset. 6. The method of claim 5 , further comprising: determining a selected number of key space sub-intervals to be generated based on a number of processing core resources in the plurality of processing core resources: wherein the cluster key domain is segregated into the selected number of key space sub-intervals. 7. The method of claim 5 , further comprising: determining a target number of records to be included in each row subset of the plurality of row subsets based on at least one of: a total number of records in the set of records, or a selected number of key space sub-intervals to be generated: wherein the cluster key domain is segregated into number of key space sub-intervals selected based on the target number of records. 8. The method of claim 1 , wherein each the plurality of key space sub-intervals includes a corresponding one of a plurality of proper subsets of the plurality of cluster key values of the cluster key domain, wherein each of the plurality of proper subsets of the plurality of cluster key values are mutually exclusive and collectively exhaustive with respect to the plurality of cluster key values, and wherein each of the plurality of proper subsets of the plurality of cluster key values include sequential ones of the plurality of cluster key values in accordance with an ordering of the plurality of cluster key values. 9. The method of claim 8 , wherein a first proper subset of the plurality of proper subsets includes a first number of cluster key values, and wherein a second proper subset of the plurality of proper subsets includes a second number of cluster key values that is different from the first number of cluster key values. 10. The method of claim 1 , wherein generating the plurality of sets of segments from the set of records is based on accessing the set of records from storage in a row-based format, wherein each of the plurality of sets of segments are generated to include a corresponding subset of the set of records in a column-based format. 11. The method of claim 10 , wherein generating the plurality of sets of segments from the set of records is further based on, for each row subset of the plurality of row subsets: generating a plurality of record groups from the each row subset based on cluster key values of records included in the each row subset: generating a set of column-formatted record data for each of the plurality of record groups; and generating a set of segments from each set of column-formatted record data. 12. The method of claim 11 , wherein generating the set of segments from each set of column-formatted record data includes generating segment metadata for each set of segments. 13. The method of claim 11 , wherein generating the set of segments from each set of column-formatted record data includes applying a redundancy storage error coding scheme to each set of column-formatted record data to generate a corresponding set of segments. 14. The method of claim 1 , wherein the key value-based record distribution data is based on empirical data indicating a number of records in the set of records having each of the plurality of cluster key values. 15. The method of claim 1 , wherein dividing the cluster key domain spanned by the plurality of cluster key values into the plurality of key space sub-intervals is based on recursively splitting intervals of the cluster key domain into two intervals until a plurality of intervals that includes a target number of intervals are created, wherein each of the plurality of key space sub-intervals corresponds to one of the plurality of intervals. 16. The method of claim 15 , wherein a given interval is split into two corresponding intervals, wherein the two corresponding intervals includes a first corresponding interval and a second corresponding interval, wherein a first subset of records of the set of records have first corresponding cluster key values included in the first corresponding interval, wherein a second subset of records of the set of records have second corresponding cluster key values included in the second corresponding interval, and wherein splitting of the given interval into the two corresponding intervals is based on minimizing a difference between a first number of records included in the first subset of records and a second number of records included in the second subset of records. 17. The method of claim 15 , wherein a given interval is split into two corresponding intervals, wherein the two corresponding intervals includes a first corresponding interval and a second corresponding interval, wherein the first corresponding interval of the cluster key domain includes a first subset of cluster keys of

Assignees

Inventors

Classifications

  • Trees · CPC title

  • Clustering; Classification · CPC title

  • Efficient disk access during query execution · CPC title

  • G06F9/5066Primary

    Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

  • of parallel queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12118402B2 cover?
A record processing and storage system is operable to receive a set of records for storage. The set of records are included in a plurality of pages stored by a page storage system, and each page of the plurality of pages includes a plurality of records in the set of records. Key value-based record distribution data is generated for the set of records based on a plurality of cluster key values o…
Who is the assignee on this patent?
Ocient Holdings LLC
What technology area does this patent fall under?
Primary CPC classification G06F9/5066. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).