Merging multiple sorted lists in a distributed computing system

US11726743B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11726743-B2
Application numberUS-202217717999-A
CountryUS
Kind codeB2
Filing dateApr 11, 2022
Priority dateNov 13, 2019
Publication dateAug 15, 2023
Grant dateAug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique is described for merging multiple lists of ordinal elements such as keys into a sorted output. In an example embodiment, a merge window is defined, based on the bounds of the multiple lists of ordinal elements, that is representative of a portion of an overall element space associated with the multiple lists. Lists of elements to be sorted can be placed into one of at least two different heaps based on whether they overlap the merge window. For example, lists that overlap the merge window may be placed into an active or “hot” heap, while lists that do not overlap the merge window may be placed into a separate inactive or “cold” heap. A sorted output can then be generated by iteratively processing the active heap. As the processing of the active heap progresses, the merge window advances, and lists may move between the active and inactive heaps.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing an ordered scan on a table in a distributed computing system, the table comprising a plurality of tablets, each of the plurality of tablets comprising one or more of a plurality of rowsets, the method comprising: accessing a plurality of per-rowset iterators, each of the plurality of per-rowset iterators including a run of keys associated with rows of a corresponding rowset; maintaining a first heap including one or more of the plurality of per-rowset iterators that overlap a merge window, the merge window being representative of a portion of a keyspace of the plurality of per-rowset iterators; and iteratively processing the first heap so as to generate an output comprising a sorted list of keys from the plurality of rowsets. 2. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators has a lower bound and an upper bound, and wherein the merge window is based on the lower bounds and upper bounds of the plurality of per-rowset iterators. 3. The method of claim 2 , further comprising: updating the lower bound and/or upper bound of a particular per-rowset iterator of the plurality of per-rowset iterators during the iterative processing of the first heap; and updating the merge window based on the updated the lower bound and/or upper bound of the particular per-rowset iterator. 4. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators have a lower bound and an upper bound, wherein the merge window has a start and an end, wherein: the start of the merge window is based on a particular lower bound of a first per-rowset iterator of the plurality of per-rowset iterators, wherein the particular lower bound is the smallest of all the lower bounds of all of the plurality of per-rowset iterators; and the end of the merge window is based on a particular upper bound of a second per-rowset iterator, wherein the particular upper bound is the smallest of all of the upper bounds of all of the per-rowset iterators that have lower bounds that are less than or equal to an upper bound of the first per-rowset iterator. 5. The method of claim 4 , wherein the first per-rowset iterator is the same as the second per-rowset iterator. 6. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators have a lower bound and an upper bound, wherein the merge window has a start and an end, wherein: the start of the merge window is based on a particular lower bound of a particular per-rowset iterator of the plurality of per-rowset iterators, wherein the particular lower bound is the smallest of all the lower bounds of all of the plurality of per-rowset iterators; and the end of the merge window is based on a particular upper bound of the particular per-rowset iterator. 7. The method of claim 1 , wherein the run of keys in each of the plurality of per-rowset iterators has a lower bound and an upper bound, and wherein the lower bound and upper bound of each of the plurality of per-rowset iterators are predefined by a distributed computing engine that manages the table. 8. The method of claim 1 , further comprising: maintaining a second heap including one or more entries, each of the one or more entries based on an upper bound of a different one of the one or more of the plurality of per-rowset iterators in the first heap. 9. The method of claim 8 , further comprising: comparing a lower bound of a particular per-rowset iterator in a third heap to a top-most entry in the second heap, the third heap including a remainder of the plurality of per-rowset iterators that do not overlap the merge window; determining, based on the comparing, that the lower bound of the particular per-rowset iterator is less than or equal to the top-most entry in the second heap; and moving the particular per-rowset iterator from the third heap to the first heap in response to determining that the lower bound of the particular per-rowset iterator is less than or equal to the top-most entry in the second heap. 10. The method of claim 8 , further comprising: comparing a lower bound of a particular per-rowset iterator in the first heap to a top-most entry in the second heap; determining, based on the comparing, that the lower bound of the particular per-rowset iterator is greater than the top-most entry in the second heap; and moving the particular per-rowset iterator from the first heap to a third heap in response to determining that the lower bound of the particular per-rowset iterator is greater than the top-most entry in the second heap. 11. The method of claim 1 , further comprising: moving at least some of the plurality of per-rowset iterators between the first heap and a second heap during the iterative processing of the first heap. 12. The method of claim 1 , wherein iteratively processing the first heap includes: performing a merge process using the first heap, the merge process including: popping a particular per-rowset iterator from a top-most position in the first heap; copying a particular non-exhausted key of one or more non-exhausted keys in the run of keys of the particular per-rowset iterator to the output, the particular non-exhausted key corresponding to a current lower bound of the particular per-rowset iterator, wherein the non-exhausted keys include any keys in the run of keys that are not yet copied to the output; designating a subsequent key of the particular per-rowset iterator as an updated lower bound of the particular per-rowset iterator; returning the particular per-rowset iterator to the first heap; and updating the ordering of the one or more per-rowset iterators in the first heap based on the updated lower bound of the particular per-rowset iterator; and repeating the merge process until a merge criterion is satisfied. 13. The method of claim 12 , wherein the merge criterion is satisfied when all keys associated with the plurality of per-rowset iterators are copied to the output. 14. The method of claim 12 , wherein the merge process further includes: determining that the particular per-rowset iterator is the only per-rowset iterator in the first heap; and copying all the non-exhausted keys in the run of keys of the particular per-rowset iterator to the output in response to determining that the particular per-rowset iterator is the only per-rowset iterator in the first heap. 15. The method of claim 12 , further comprising: updating the merge window based on the updated lower bound of the particular per-rowset iterator. 16. The method of claim 1 , wherein the first heap is a min-heap and is ordered based on lower bounds of the one or more of the plurality of per-rowset iterators that overlap the merge window. 17. The method of claim 1 , further comprising: receiving a query request from a client, the query request including one or more query conditions; wherein the runs of keys of the plurality of per-rowset iterators are associated with rows in the table that satisfy the one or more query conditions. 18. The method of claim 1 , wherein the table is managed in the distributed computing system using Apache Kudu™. 19. A non-transitory computer-readable storage medium storing instructions for performing an ordered scan on a table in a distributed computing system, the table comprising a plurality of tablets, each of the plurality of tablets comprising one or more of a plurality of rowsets, wherein the instructions when executed cause a computer processor to perform a metho

Assignees

Inventors

Classifications

  • G06F7/08Primary

    Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • G06F7/36Primary

    Combined merging and sorting · CPC title

  • Distributed queries · CPC title

  • of query operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11726743B2 cover?
A technique is described for merging multiple lists of ordinal elements such as keys into a sorted output. In an example embodiment, a merge window is defined, based on the bounds of the multiple lists of ordinal elements, that is representative of a portion of an overall element space associated with the multiple lists. Lists of elements to be sorted can be placed into one of at least two diff…
Who is the assignee on this patent?
Cloudera Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).