Systems and methods for performing scalable Log-Structured Merge (LSM) tree compaction using sharding

US10909102B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10909102-B2
Application numberUS-201816212550-A
CountryUS
Kind codeB2
Filing dateDec 6, 2018
Priority dateDec 6, 2018
Publication dateFeb 2, 2021
Grant dateFeb 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Certain aspects provide systems and methods of compacting data within a log-structured merge tree (LSM tree) using sharding. In certain aspects, a method includes determining a size of the LSM tree, determining a compaction time for a compaction of the LSM tree based on the size, determining a number of compaction entities for performing the compaction in parallel based on the compaction time, determining a number of shards based on the number of compaction entities, and determining a key range associated with the LSM tree. The method further comprises dividing the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards and assigning the number of shards to the number of compaction entities for compaction.

First claim

Opening claim text (preview).

We claim: 1. A method of compacting data within a log-structured merge tree (LSM tree) using sharding, comprising: determining a size of the LSM tree; determining a compaction time for a compaction of the LSM tree based on the size; determining a number of compaction entities for performing the compaction in parallel based on the compaction time; determining a number of shards based on the number of compaction entities; determining a key range associated with the LSM tree; dividing the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards; assigning the number of shards to the number of compaction entities for compaction; and compacting, using at least one hardware processor associated with the number of compaction entities, the data within the LSM tree by compacting the number of shards. 2. The method of claim 1 , wherein determining the compaction time is further based on an input/output (I/O) bandwidth and a recovery point objective corresponding to data stored within the LSM tree. 3. The method of claim 2 , wherein the I/O bandwidth is associated with at least one of a network over which I/O to the LSM tree is performed or storage resources that store the LSM tree. 4. The method of claim 1 , wherein determining the key range is based on examining a catalogue file associated with the LSM tree. 5. The method of claim 1 , wherein determining the number of compaction entities is based on a likelihood relating to one or more of the number of compaction entities failing during the compaction. 6. The method of claim 1 , wherein: determining the number of compaction entities is based on a following formula: Number of Compaction Entities=N*(1+E); E is a likelihood relating to one or more of the number of compaction entities failing during the compaction; N equals 2S/(B*T); N corresponds to an estimated amount of time it takes to compact the data within the LSM tree; S corresponds to the size of the LSM tree; B corresponds to an input/output (I/O) bandwidth, which comprises an I/O bandwidth associated with a storage resource storing the data within the LSM tree; and T corresponds to a recovery point objective associated with the data within the LSM tree. 7. The method of claim 1 , wherein the number of shards is higher than the number of compaction entities. 8. The method of claim 1 , wherein the number of sub key ranges are non-overlapping. 9. An apparatus, comprising: a non-transitory memory comprising executable instructions; and a processor in data communication with the memory and configured to execute the instructions to cause the apparatus to: determine a size of the LSM tree; determine a compaction time for a compaction of the LSM tree based on the size; determine a number of compaction entities for performing the compaction in parallel based on the compaction time; determine a number of shards based on the number of compaction entities; determine a key range associated with the LSM tree; divide the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards; assign the number of shards to the number of compaction entities for compaction; and compact, using at least one hardware processor associated with the number of compaction entities, the data within the LSM tree by compacting the number of shards. 10. The apparatus of claim 9 , wherein the processor being configured to cause the apparatus to determine the compaction time comprises the processor being configured to cause the apparatus to determine the compaction time based on an input/output (I/O) bandwidth and a recovery point objective corresponding to data stored within the LSM tree. 11. The apparatus of claim 10 , wherein the I/O bandwidth is associated with at least one of a network over which I/O to the LSM tree is performed or storage resources that store the LSM tree. 12. The apparatus of claim 9 , wherein the processor being configured to cause the apparatus to determine the key range comprises the processor being configured to cause the apparatus to determine the key range is based on examining a catalogue file associated with the LSM tree. 13. The apparatus of claim 9 , wherein the processor being configured to cause the apparatus to determine the number of compaction entities comprises the processor being configured to cause the apparatus to determine the number of compaction entities is based on a likelihood relating to one or more of the number of compaction entities failing during the compaction. 14. The apparatus of claim 9 , wherein: the processor being configured to cause the apparatus to determine the number of compaction entities comprises the processor being configured to cause the apparatus to determine the number of compaction entities based on a following formula: Number of Compaction Entities=N*(1+E); E corresponds to a likelihood relating to one or more of the number of compaction entities failing during the compaction; N equals 2S/(B*T); N corresponds to an estimated amount of time it takes to compact the data within the LSM tree; S corresponds to the size of the LSM tree; B corresponds to an input/output (I/O) bandwidth, which comprises an I/O bandwidth associated with a storage resource storing the data within the LSM tree; and T corresponds to a recovery point objective associated with the data within the LSM tree. 15. The apparatus of claim 9 , wherein the number of shards is higher than the number of compaction entities. 16. The apparatus of claim 9 , wherein the number of sub key ranges are non-overlapping. 17. A non-transitory computer readable medium having instructions stored thereon that, when executed by a computing system, cause the computing system to perform a method comprising: determining a size of the LSM tree; determining a compaction time for a compaction of the LSM tree based on the size; determining a number of compaction entities for performing the compaction in parallel based on the compaction time; determining a number of shards based on the number of compaction entities; determining a key range associated with the LSM tree; dividing the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards; assigning the number of shards to the number of compaction entities for compaction; and compacting, using at least one hardware processor associated with the number of compaction entities, the data within the LSM tree by compacting the number of shards. 18. The non-transitory computer readable medium of claim 17 , wherein determining the compaction time is further based on an input/output (I/O) bandwidth and a recovery point objective corresponding to data stored within the LSM tree. 19. The non-transitory computer readable medium of claim 18 , wherein the I/O bandwidth is associated with at least one of a network over which I/O to the LSM tree is performed or storage resources that store the LSM tree. 20. The non-transitory computer readable medium of claim 17 , wherein determining the key range is based on examining a catalogue file associated with the LSM tree. 21. The non-transitory computer readable medium of claim 17 , wherein determining the number of compaction entities is based on a likelihood relating to one or more of the number of c

Assignees

Inventors

Classifications

  • Trees, e.g. B+trees · CPC title

  • Parallelization · CPC title

  • Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10909102B2 cover?
Certain aspects provide systems and methods of compacting data within a log-structured merge tree (LSM tree) using sharding. In certain aspects, a method includes determining a size of the LSM tree, determining a compaction time for a compaction of the LSM tree based on the size, determining a number of compaction entities for performing the compaction in parallel based on the compaction time, …
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2246. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).