Merging multiple sorted lists in a distributed computing system

US11301210B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11301210-B2
Application numberUS-202016775141-A
CountryUS
Kind codeB2
Filing dateJan 28, 2020
Priority dateNov 13, 2019
Publication dateApr 12, 2022
Grant dateApr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique is described for merging multiple lists of ordinal elements such as keys into a sorted output. In an example embodiment, a merge window is defined, based on the bounds of the multiple lists of ordinal elements, that is representative of a portion of an overall element space associated with the multiple lists. Lists of elements to be sorted can be placed into one of at least two different heaps based on whether they overlap the merge window. For example, lists that overlap the merge window may be placed into an active or “hot” heap, while lists that do not overlap the merge window may be placed into a separate inactive or “cold” heap. A sorted output can then be generated by iteratively processing the active heap. As the processing of the active heap progresses, the merge window advances, and lists may move between the active and inactive heaps.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing an ordered scan on a table in a distributed computing system, the table comprising a plurality of tablets stored across a plurality of nodes of the distributed computing system, each of the plurality of tablets comprising one or more of a plurality of rowsets, the method comprising: accessing a plurality of per-rowset iterators, each of the plurality of per-rowset iterators corresponding to a different one of the plurality of rowsets, each of the plurality of per-rowset iterators including a run of keys associated with rows of a corresponding rowset; maintaining a first heap including one or more of the plurality of per-rowset iterators that overlap a merge window, the merge window being representative of a portion of a keyspace of the plurality of per-rowset iterators; maintaining a second heap including a remainder of the plurality of per-rowset iterators that do not overlap the merge window; and iteratively processing the first heap so as to generate an output comprising a sorted list of keys from the plurality of rowsets. 2. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators has a lower bound and an upper bound, and wherein the merge window is based on the lower bounds and upper bounds of the plurality of per-rowset iterators. 3. The method of claim 2 , further comprising: updating the lower bound and/or upper bound of a particular per-rowset iterator of the plurality of per-rowset iterators during the iterative processing of the first heap; and updating the merge window based on the updated the lower bound and/or upper bound of the particular per-rowset iterator. 4. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators have a lower bound and an upper bound, wherein the merge window has a start and an end, wherein: the start of the merge window is based on a particular lower bound of a first per-rowset iterator of the plurality of per-rowset iterators, wherein the particular lower bound is the smallest of all the lower bounds of all of the plurality of per-rowset iterators; and the end of the merge window is based on a particular upper bound of a second per-rowset iterator, wherein the particular upper bound is the smallest of all of the upper bounds of all of the per-rowset iterators that have lower bounds that are less than or equal to an upper bound of the first per-rowset iterator. 5. The method of claim 4 , wherein the first per-rowset iterator is the same as the second per-rowset iterator. 6. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators have a lower bound and an upper bound, wherein the merge window has a start and an end, wherein: the start of the merge window is based on a particular lower bound of a particular per-rowset iterator of the plurality of per-rowset iterators, wherein the particular lower bound is the smallest of all the lower bounds of all of the plurality of per-rowset iterators; and the end of the merge window is based on a particular upper bound of the particular per-rowset iterator. 7. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators has a lower bound and an upper bound, and wherein the lower bound and upper bound of each of the plurality of per-rowset iterators are predefined by a distributed comping engine that manages the table. 8. The method of claim 1 , further comprising: maintaining a third heap including one or more entries, each of the one or more entries based on an upper bound of a different one of the one or more per-rowset iterators in the first heap. 9. The method of claim 8 , further comprising: comparing a lower bound of a particular per-rowset iterator in the second heap to a top-most entry in the third heap; determining, based on the comparing, that the lower bound of the particular per-rowset iterator is less than or equal to the top-most entry in the third heap; and moving the particular per-rowset iterator from the second heap to the first heap in response to determining that the lower bound of the particular per-rowset iterator is less than or equal to the top-most entry in the third heap. 10. The method of claim 8 , further comprising: comparing a lower bound of a particular per-rowset iterator in the first heap to a top-most entry in the third heap; determining, based on the comparing, that the lower bound of the particular per-rowset iterator is greater than the top-most entry in the third heap; and moving the particular per-rowset iterator from the first heap to the second heap in response to determining that the lower bound of the particular per-rowset iterator is greater than the top-most entry in the third heap. 11. The method of claim 1 , further comprising: moving at least some of the plurality of per-rowset iterators between the first heap and the second heap during the iterative processing of the first heap. 12. The method of claim 1 , wherein iteratively processing the first heap includes: performing a merge process using the first heap, the merge process including: popping a particular per-rowset iterator from a top-most position in the first heap; copying a particular non-exhausted key of one or more non-exhausted keys in the run of keys of the particular per-rowset iterator to the output, the particular non-exhausted key corresponding to a current lower bound of the particular per-rowset iterator, wherein the non-exhausted keys include any keys in the run of keys that are not yet copied to the output; designating a subsequent key of the particular per-rowset iterator as an updated lower bound of the particular per-rowset iterator; returning the particular per-rowset iterator to the first heap; and updating the ordering of the one or more per-rowset iterators in the first heap based on the updated lower bound of the particular per-rowset iterator; and repeating the merge process until a merge criterion is satisfied. 13. The method of claim 12 , wherein the merge criterion is satisfied when all keys associated with the plurality of per-rowset iterators are copied to the output. 14. The method of claim 12 , wherein the merge process further includes: determining that the particular per-rowset iterator is the only per-rowset iterator in the first heap; and copying all the non-exhausted keys in the run of keys of the particular per-rowset iterator to the output in response to determining that the particular per-rowset iterator is the only per-rowset iterator in the first heap. 15. The method of claim 12 , further comprising: updating the merge window based on the updated lower bound of the particular per-rowset iterator. 16. The method of claim 1 , wherein the first heap and second heap are both min-heaps, wherein the first heap is ordered based on the lower bounds of the one or more of the plurality of per-rowset iterators that overlap the merge window, and wherein the second heap is ordered based on the lower bounds of the remainder of the plurality of per-rowset iterators that do not overlap the merge window. 17. The method of claim 1 , further comprising: receiving a query request from a client, the query request including one or more query conditions; wherein the runs of keys of the plurality of per-rowset iterators are associated with rows in the table that satisfy the one or more query conditions. 18. The method of claim 1 , wherein the table is managed in the distributed computing system using Apache Kudu™.

Assignees

Inventors

Classifications

  • of query operations · CPC title

  • G06F7/36Primary

    Combined merging and sorting · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • G06F7/08Primary

    Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title

  • Distributed queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11301210B2 cover?
A technique is described for merging multiple lists of ordinal elements such as keys into a sorted output. In an example embodiment, a merge window is defined, based on the bounds of the multiple lists of ordinal elements, that is representative of a portion of an overall element space associated with the multiple lists. Lists of elements to be sorted can be placed into one of at least two diff…
Who is the assignee on this patent?
Cloudera Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/36. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).