High performance index creation on sorted data using parallel query plans

US9959312B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9959312-B2
Application numberUS-201314019392-A
CountryUS
Kind codeB2
Filing dateSep 5, 2013
Priority dateSep 5, 2013
Publication dateMay 1, 2018
Grant dateMay 1, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Creation of an index for a table of sorted data for use by a data storage application is initiated. Thereafter, N+1 logical partition of rows of the table are defined so that each logical partition has a corresponding worker process. Each worker process then builds a sub-index based on the corresponding logical partition which are later merged to form the index. Related apparatus, systems, techniques and articles are also described.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for implementation by one or more data processors of at least one computing system comprising: initiating creation of an index for a table of sorted data for use by a data storage application, the sorted data being sorted according to index key values in an index key column of the table; partitioning the table into a plurality of logical partitions each comprising one or more rows of the table, a number of logical partitions in the plurality of logical partitions being equal to a number of worker processes in a plurality of worker processes, each logical partition of the plurality of logical partitions having a corresponding worker process of the plurality of worker processes, each logical partition being a range of values defined by an upper partition boundary and a lower partition boundary, the partitioning comprising collecting a sample from the table and determining, based on the sample, a row identifier for the lower partition boundary and a row identifier for the upper partition boundary for each partition of the plurality of partitions, and an upper partition boundary of a first logical partition of the plurality of partitions being a lower partition boundary of a second logical partition of the plurality of partitions; and building, by each of the plurality of worker processes, a sub-index based on the logical partition to which that worker process corresponds, the building of the sub-index by each of the plurality of worker processes resulting in a plurality of sub-indexes with one sub-index of the plurality of sub-indexes being based on each of the plurality of logical partitions, and the building of the sub-index comprising: initiating a scan of the table at a row having a row identifier that matches a row identifier of the lower partition boundary; identifying a first row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the lower partition boundary, the first row qualifying for inclusion in the sub-index based at least on an index key value of the first row being greater than the index key value of the lower partition boundary; in response to identifying the first row, identifying a second row that qualifies for inclusion in the sub-index by at least performing a comparison to a row identifier of the upper partition boundary, the second row qualifying for inclusion in the sub-index based at least on a row identifier of the second row being less than and/or equal to the row identifier of the upper partition boundary; in response to identifying the second row, identifying a third row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the upper partition boundary, the third row qualifying for inclusion in the sub-index based on an index key value of the third row matching the index key value of the upper partition boundary; and in response to identifying the third row, terminating the building of the sub-index based at least on an index key value of a fourth row being greater than the index key value of the upper partition boundary; and merging the plurality of sub-indexes to form the index. 2. A method as in claim 1 , wherein the building comprises: creating a parallel execution query plan to be executed by the number of worker processes and one coordinating worker process. 3. A method as in claim 2 , wherein the coordinating worker process causes the merging of the plurality of sub-indexes to form the index. 4. A method as in claim 2 , wherein the building further comprises: executing the parallel execution query plan. 5. A method as in claim 4 , wherein the executing comprises: reading data rows from a beginning of the table by a worker process assigned to a first partition of the plurality of partitions. 6. A method as in claim 1 , wherein at least one of the initiating of the creation of the index, the partitioning of the table, the building of the sub-index, and the merging of the plurality of sub-indexes is implemented by at least one data processor. 7. A non-transitory computer program product storing instructions which when executed by at least one data processor of at least one computing system result in operations comprising: initiating creation of an index for a table of sorted data for use by a data storage application, the sorted data being sorted according to index key values in an index key column of the table; partitioning the table into a plurality of logical partitions each comprising one or more rows of the table, a number of logical partitions in the plurality of logical partitions being equal to a number of worker processes in a plurality of worker processes, each logical partition of the plurality of logical partitions having a corresponding worker process of the plurality of worker processes, each logical partition being a range of values defined by an upper partition boundary and a lower partition boundary, the partitioning comprising collecting a sample from the table and determining, based on the sample, a row identifier for the lower partition boundary and a row identifier for the upper partition boundary for each partition of the plurality of partitions, and an upper partition boundary of a first logical partition of the plurality of partitions being a lower partition boundary of a second logical partition of the plurality of partitions; and building, by each of the plurality of worker processes, a sub-index based on the logical partition to which that worker process corresponds, the building of the sub-index by each of the plurality of worker processes resulting in a plurality of sub-indexes with one sub-index of the plurality of sub-indexes being based on each of the plurality of logical partitions, and the building of the sub-index comprising: initiating a scan of the table at a row having a row identifier that matches a row identifier of the lower partition boundary; identifying a first row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the lower partition boundary, the first row qualifying for inclusion in the sub-index based at least on an index key value of the first row being greater than the index key value of the lower partition boundary; in response to identifying the first row, identifying a second row that qualifies for inclusion in the sub-index by at least performing a comparison to a row identifier of the upper partition boundary, the second row qualifying for inclusion in the sub-index based at least on a row identifier of the second row being less than and/or equal to the row identifier of the upper partition boundary; in response to identifying the second row, identifying a third row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the upper partition boundary, the third row qualifying for inclusion in the sub-index based on an index key value of the third row matching the index key value of the upper partition boundary; and in response to identifying the third row, terminating the building of the sub-index based at least on an index key value of a fourth row being greater than the index key value of the upper partition boundary; and merging the plurality of sub-indexes to form the index. 8. A non-transitory computer program product as in claim 7 , wherein the building comprises: creating a parallel execution query plan to be executed by the number of worker processes and one coordinating worker process. 9. A non-transitory computer program product as in claim 8 , wherein the coordinating worker process causes the merging of the plurality of sub-indexes to form the index. 10. A non-transitory comp

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9959312B2 cover?
Creation of an index for a table of sorted data for use by a data storage application is initiated. Thereafter, N+1 logical partition of rows of the table are defined so that each logical partition has a corresponding worker process. Each worker process then builds a sub-index based on the corresponding logical partition which are later merged to form the index. Related apparatus, systems, tech…
Who is the assignee on this patent?
Schneider Peter, Sybase Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24532. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).