Parallelized segment generation via key-based subdivision in database systems

US2022043690A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022043690-A1
Application numberUS-202016985957-A
CountryUS
Kind codeA1
Filing dateAug 5, 2020
Priority dateAug 5, 2020
Publication dateFeb 10, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for execution by a record processing and storage system includes assigning each of a plurality of key space sub-intervals of a cluster key domain to a corresponding one of a plurality of processing core resources, and generating a plurality of segments from the set of records via the plurality of processing core resources. Each processing core resource in the plurality of processing core resources generates a subset of the plurality of segments by identifying a proper subset of the set of records based on having cluster key values included in a corresponding one of the plurality of key space sub-intervals, and by generating the subset of the plurality of segments to include the proper subset of the set of records.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for execution by a record processing and storage system, comprising: assigning each of a plurality of key space sub-intervals of a cluster key domain spanned by a plurality of cluster key values of a set of records to a corresponding one of a plurality of processing core resources; and generating a plurality of segments from the set of records via the plurality of processing core resources, wherein each processing core resource in the plurality of processing core resources generates a subset of the plurality of segments by: identifying, via each processing core resource, a proper subset of the set of records based on having cluster key values included in a corresponding one of the plurality of key space sub-intervals; and generating, via the each processing core resource, the subset of the plurality of segments to include the proper subset of the set of records. 2 . The method of claim 1 , further comprising segregating the cluster key domain into the plurality of key space sub-intervals. 3 . The method of claim 2 , further comprising: determining a selected number of key space sub-intervals to be generated based on a number of processing core resources in the plurality of processing core resources; wherein the cluster key domain is segregated into the selected number of key space sub-intervals. 4 . The method of claim 2 , further comprising: determining a target number of records to be included in each proper subset of the set of records based on: a total number of records in the set of records, and a selected number of key space sub-intervals to be generated; wherein the cluster key domain is segregated into the selected number of key space sub-intervals based on the target number of records. 5 . The method of claim 1 , wherein each the plurality of key space sub-intervals includes a corresponding one of a plurality of proper subsets of the plurality of cluster key values of the cluster key domain, wherein each of the plurality of proper subsets of the plurality of cluster key values are mutually exclusive and collectively exhaustive with respect to the plurality of cluster key values, and wherein each of the plurality of proper subsets of the plurality of cluster keys include sequential ones of the plurality of cluster key values in accordance with an ordering of the plurality of cluster key values. 6 . The method of claim 5 , wherein a first proper subset of the plurality of proper subsets includes a first number of cluster key values, and wherein a second proper subset of the plurality of proper subsets includes a second number of cluster key values that is different from the first number of cluster key values. 7 . The method of claim 1 , wherein generating the plurality of segments from the set of records via the plurality of processing core resources further comprises: accessing, via the each processing core resource, the proper subset of the set of records from storage in a row-based format; wherein the subset of the plurality of segments are generated to include the proper subset of the set of records in a column-based format. 8 . The method of claim 7 , wherein generating the plurality of segments from the set of records via the plurality of processing core resources further comprises: generating a plurality of record groups from the proper subset of the set of records based on cluster key values of the proper subset of the set of records; generating a set of column-formatted record data for each of the plurality of record groups; and generating a set of segments from each set of column-formatted record data. 9 . The method of claim 8 , wherein generating the set of segments from each set of column-formatted record data includes generating segment metadata for each set of segments. 10 . The method of claim 8 , wherein generating the set of segments from each set of column-formatted record data includes applying a redundancy storage error coding scheme to each set of column-formatted record data to generate a corresponding set of segments. 11 . The method of claim 1 , wherein the set of records are included in a plurality of pages stored by a page storage system, and wherein each page of the plurality of pages includes a plurality of records in the set of records. 12 . The method of claim 11 , further comprising: generating the plurality of pages; and determining to convert the plurality of pages into the plurality of records based on storage utilization data. 13 . The method of claim 11 , wherein identifying the proper subset of the set of records via the each processing core resource includes: accessing, via the each processing core resource, each of the plurality of pages; extracting, via the each processing core resource, ones of the plurality of records in the each of the plurality of pages having cluster key values included in the corresponding one of the plurality of key space sub-intervals. 14 . The method of claim 13 , wherein identifying the proper subset of the set of records via the each processing core resource further includes: populating a data structure with location data for the ones of the plurality of records in corresponding ones of the plurality of pages, wherein the data structure is organized based on an ordering of cluster key values of the ones of the plurality of records; extracting records from the plurality of pages in accordance with the ordering of cluster key values by utilizing the data structure. 15 . The method of claim 14 , wherein the data structure implements a min-heap organized by cluster key values. 16 . The method of claim 11 , wherein one plurality of records of one page of the plurality of pages includes: a first record having a first cluster key value included in a first one of the plurality of key space sub-intervals; and a second record having a second cluster key value included in a second one of the plurality of key space sub-intervals. wherein another plurality of records of another page of the plurality of pages includes: a third record having a third cluster key value included in the first one of the plurality of key space sub-intervals; and a fourth record having a fourth cluster key value included in the second one of the plurality of key space sub-intervals. 17 . The method of claim 16 , wherein generating the plurality of segments from the set of records via the plurality of processing core resources includes: accessing, via a first processing core resource, the one page and the another page; identifying, via the first processing core resource, a corresponding first proper subset of the set of records to include the first record and the third record, and to not include the second record and the fourth record, by identifying cluster key values included in the first one of the plurality of key space sub-intervals based on the first one of the plurality of key space sub-intervals being assigned to the first processing core resource; accessing, via a second processing core resource, the one page and the another page; and identifying, via the second processing core resource, a corresponding second proper subset of the set of records to include the second record and the fourth record, and to not include the first record and the third record, by identifying cluster key values included in the second one of the plurality of key space sub-intervals based on the second one of the plurality of key space sub-intervals being assigned to the second processing core resource. 18 . The method of claim

Assignees

Inventors

Classifications

  • of parallel queries · CPC title

  • Efficient disk access during query execution · CPC title

  • Trees · CPC title

  • Clustering; Classification · CPC title

  • G06F9/5066Primary

    Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022043690A1 cover?
A method for execution by a record processing and storage system includes assigning each of a plurality of key space sub-intervals of a cluster key domain to a corresponding one of a plurality of processing core resources, and generating a plurality of segments from the set of records via the plurality of processing core resources. Each processing core resource in the plurality of processing co…
Who is the assignee on this patent?
Ocient Holdings LLC
What technology area does this patent fall under?
Primary CPC classification G06F16/24532. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 10 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).