Parallel Processing Of Data
US-2024338235-A1 · Oct 10, 2024 · US
US9959312B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9959312-B2 |
| Application number | US-201314019392-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 5, 2013 |
| Priority date | Sep 5, 2013 |
| Publication date | May 1, 2018 |
| Grant date | May 1, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Creation of an index for a table of sorted data for use by a data storage application is initiated. Thereafter, N+1 logical partition of rows of the table are defined so that each logical partition has a corresponding worker process. Each worker process then builds a sub-index based on the corresponding logical partition which are later merged to form the index. Related apparatus, systems, techniques and articles are also described.
Opening claim text (preview).
What is claimed is: 1. A method for implementation by one or more data processors of at least one computing system comprising: initiating creation of an index for a table of sorted data for use by a data storage application, the sorted data being sorted according to index key values in an index key column of the table; partitioning the table into a plurality of logical partitions each comprising one or more rows of the table, a number of logical partitions in the plurality of logical partitions being equal to a number of worker processes in a plurality of worker processes, each logical partition of the plurality of logical partitions having a corresponding worker process of the plurality of worker processes, each logical partition being a range of values defined by an upper partition boundary and a lower partition boundary, the partitioning comprising collecting a sample from the table and determining, based on the sample, a row identifier for the lower partition boundary and a row identifier for the upper partition boundary for each partition of the plurality of partitions, and an upper partition boundary of a first logical partition of the plurality of partitions being a lower partition boundary of a second logical partition of the plurality of partitions; and building, by each of the plurality of worker processes, a sub-index based on the logical partition to which that worker process corresponds, the building of the sub-index by each of the plurality of worker processes resulting in a plurality of sub-indexes with one sub-index of the plurality of sub-indexes being based on each of the plurality of logical partitions, and the building of the sub-index comprising: initiating a scan of the table at a row having a row identifier that matches a row identifier of the lower partition boundary; identifying a first row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the lower partition boundary, the first row qualifying for inclusion in the sub-index based at least on an index key value of the first row being greater than the index key value of the lower partition boundary; in response to identifying the first row, identifying a second row that qualifies for inclusion in the sub-index by at least performing a comparison to a row identifier of the upper partition boundary, the second row qualifying for inclusion in the sub-index based at least on a row identifier of the second row being less than and/or equal to the row identifier of the upper partition boundary; in response to identifying the second row, identifying a third row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the upper partition boundary, the third row qualifying for inclusion in the sub-index based on an index key value of the third row matching the index key value of the upper partition boundary; and in response to identifying the third row, terminating the building of the sub-index based at least on an index key value of a fourth row being greater than the index key value of the upper partition boundary; and merging the plurality of sub-indexes to form the index. 2. A method as in claim 1 , wherein the building comprises: creating a parallel execution query plan to be executed by the number of worker processes and one coordinating worker process. 3. A method as in claim 2 , wherein the coordinating worker process causes the merging of the plurality of sub-indexes to form the index. 4. A method as in claim 2 , wherein the building further comprises: executing the parallel execution query plan. 5. A method as in claim 4 , wherein the executing comprises: reading data rows from a beginning of the table by a worker process assigned to a first partition of the plurality of partitions. 6. A method as in claim 1 , wherein at least one of the initiating of the creation of the index, the partitioning of the table, the building of the sub-index, and the merging of the plurality of sub-indexes is implemented by at least one data processor. 7. A non-transitory computer program product storing instructions which when executed by at least one data processor of at least one computing system result in operations comprising: initiating creation of an index for a table of sorted data for use by a data storage application, the sorted data being sorted according to index key values in an index key column of the table; partitioning the table into a plurality of logical partitions each comprising one or more rows of the table, a number of logical partitions in the plurality of logical partitions being equal to a number of worker processes in a plurality of worker processes, each logical partition of the plurality of logical partitions having a corresponding worker process of the plurality of worker processes, each logical partition being a range of values defined by an upper partition boundary and a lower partition boundary, the partitioning comprising collecting a sample from the table and determining, based on the sample, a row identifier for the lower partition boundary and a row identifier for the upper partition boundary for each partition of the plurality of partitions, and an upper partition boundary of a first logical partition of the plurality of partitions being a lower partition boundary of a second logical partition of the plurality of partitions; and building, by each of the plurality of worker processes, a sub-index based on the logical partition to which that worker process corresponds, the building of the sub-index by each of the plurality of worker processes resulting in a plurality of sub-indexes with one sub-index of the plurality of sub-indexes being based on each of the plurality of logical partitions, and the building of the sub-index comprising: initiating a scan of the table at a row having a row identifier that matches a row identifier of the lower partition boundary; identifying a first row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the lower partition boundary, the first row qualifying for inclusion in the sub-index based at least on an index key value of the first row being greater than the index key value of the lower partition boundary; in response to identifying the first row, identifying a second row that qualifies for inclusion in the sub-index by at least performing a comparison to a row identifier of the upper partition boundary, the second row qualifying for inclusion in the sub-index based at least on a row identifier of the second row being less than and/or equal to the row identifier of the upper partition boundary; in response to identifying the second row, identifying a third row that qualifies for inclusion in the sub-index by at least performing a comparison to an index key value of the upper partition boundary, the third row qualifying for inclusion in the sub-index based on an index key value of the third row matching the index key value of the upper partition boundary; and in response to identifying the third row, terminating the building of the sub-index based at least on an index key value of a fourth row being greater than the index key value of the upper partition boundary; and merging the plurality of sub-indexes to form the index. 8. A non-transitory computer program product as in claim 7 , wherein the building comprises: creating a parallel execution query plan to be executed by the number of worker processes and one coordinating worker process. 9. A non-transitory computer program product as in claim 8 , wherein the coordinating worker process causes the merging of the plurality of sub-indexes to form the index. 10. A non-transitory comp
of parallel queries · CPC title
Plan optimisation · CPC title
Management thereof · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.