Key-value compaction
US-2019004768-A1 · Jan 3, 2019 · US
US10909102B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10909102-B2 |
| Application number | US-201816212550-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 6, 2018 |
| Priority date | Dec 6, 2018 |
| Publication date | Feb 2, 2021 |
| Grant date | Feb 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Certain aspects provide systems and methods of compacting data within a log-structured merge tree (LSM tree) using sharding. In certain aspects, a method includes determining a size of the LSM tree, determining a compaction time for a compaction of the LSM tree based on the size, determining a number of compaction entities for performing the compaction in parallel based on the compaction time, determining a number of shards based on the number of compaction entities, and determining a key range associated with the LSM tree. The method further comprises dividing the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards and assigning the number of shards to the number of compaction entities for compaction.
Opening claim text (preview).
We claim: 1. A method of compacting data within a log-structured merge tree (LSM tree) using sharding, comprising: determining a size of the LSM tree; determining a compaction time for a compaction of the LSM tree based on the size; determining a number of compaction entities for performing the compaction in parallel based on the compaction time; determining a number of shards based on the number of compaction entities; determining a key range associated with the LSM tree; dividing the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards; assigning the number of shards to the number of compaction entities for compaction; and compacting, using at least one hardware processor associated with the number of compaction entities, the data within the LSM tree by compacting the number of shards. 2. The method of claim 1 , wherein determining the compaction time is further based on an input/output (I/O) bandwidth and a recovery point objective corresponding to data stored within the LSM tree. 3. The method of claim 2 , wherein the I/O bandwidth is associated with at least one of a network over which I/O to the LSM tree is performed or storage resources that store the LSM tree. 4. The method of claim 1 , wherein determining the key range is based on examining a catalogue file associated with the LSM tree. 5. The method of claim 1 , wherein determining the number of compaction entities is based on a likelihood relating to one or more of the number of compaction entities failing during the compaction. 6. The method of claim 1 , wherein: determining the number of compaction entities is based on a following formula: Number of Compaction Entities=N*(1+E); E is a likelihood relating to one or more of the number of compaction entities failing during the compaction; N equals 2S/(B*T); N corresponds to an estimated amount of time it takes to compact the data within the LSM tree; S corresponds to the size of the LSM tree; B corresponds to an input/output (I/O) bandwidth, which comprises an I/O bandwidth associated with a storage resource storing the data within the LSM tree; and T corresponds to a recovery point objective associated with the data within the LSM tree. 7. The method of claim 1 , wherein the number of shards is higher than the number of compaction entities. 8. The method of claim 1 , wherein the number of sub key ranges are non-overlapping. 9. An apparatus, comprising: a non-transitory memory comprising executable instructions; and a processor in data communication with the memory and configured to execute the instructions to cause the apparatus to: determine a size of the LSM tree; determine a compaction time for a compaction of the LSM tree based on the size; determine a number of compaction entities for performing the compaction in parallel based on the compaction time; determine a number of shards based on the number of compaction entities; determine a key range associated with the LSM tree; divide the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards; assign the number of shards to the number of compaction entities for compaction; and compact, using at least one hardware processor associated with the number of compaction entities, the data within the LSM tree by compacting the number of shards. 10. The apparatus of claim 9 , wherein the processor being configured to cause the apparatus to determine the compaction time comprises the processor being configured to cause the apparatus to determine the compaction time based on an input/output (I/O) bandwidth and a recovery point objective corresponding to data stored within the LSM tree. 11. The apparatus of claim 10 , wherein the I/O bandwidth is associated with at least one of a network over which I/O to the LSM tree is performed or storage resources that store the LSM tree. 12. The apparatus of claim 9 , wherein the processor being configured to cause the apparatus to determine the key range comprises the processor being configured to cause the apparatus to determine the key range is based on examining a catalogue file associated with the LSM tree. 13. The apparatus of claim 9 , wherein the processor being configured to cause the apparatus to determine the number of compaction entities comprises the processor being configured to cause the apparatus to determine the number of compaction entities is based on a likelihood relating to one or more of the number of compaction entities failing during the compaction. 14. The apparatus of claim 9 , wherein: the processor being configured to cause the apparatus to determine the number of compaction entities comprises the processor being configured to cause the apparatus to determine the number of compaction entities based on a following formula: Number of Compaction Entities=N*(1+E); E corresponds to a likelihood relating to one or more of the number of compaction entities failing during the compaction; N equals 2S/(B*T); N corresponds to an estimated amount of time it takes to compact the data within the LSM tree; S corresponds to the size of the LSM tree; B corresponds to an input/output (I/O) bandwidth, which comprises an I/O bandwidth associated with a storage resource storing the data within the LSM tree; and T corresponds to a recovery point objective associated with the data within the LSM tree. 15. The apparatus of claim 9 , wherein the number of shards is higher than the number of compaction entities. 16. The apparatus of claim 9 , wherein the number of sub key ranges are non-overlapping. 17. A non-transitory computer readable medium having instructions stored thereon that, when executed by a computing system, cause the computing system to perform a method comprising: determining a size of the LSM tree; determining a compaction time for a compaction of the LSM tree based on the size; determining a number of compaction entities for performing the compaction in parallel based on the compaction time; determining a number of shards based on the number of compaction entities; determining a key range associated with the LSM tree; dividing the key range by the number of shards into a number of sub key ranges, wherein each of the number of sub key ranges corresponds to a shard of the number of shards; assigning the number of shards to the number of compaction entities for compaction; and compacting, using at least one hardware processor associated with the number of compaction entities, the data within the LSM tree by compacting the number of shards. 18. The non-transitory computer readable medium of claim 17 , wherein determining the compaction time is further based on an input/output (I/O) bandwidth and a recovery point objective corresponding to data stored within the LSM tree. 19. The non-transitory computer readable medium of claim 18 , wherein the I/O bandwidth is associated with at least one of a network over which I/O to the LSM tree is performed or storage resources that store the LSM tree. 20. The non-transitory computer readable medium of claim 17 , wherein determining the key range is based on examining a catalogue file associated with the LSM tree. 21. The non-transitory computer readable medium of claim 17 , wherein determining the number of compaction entities is based on a likelihood relating to one or more of the number of c
Trees, e.g. B+trees · CPC title
Parallelization · CPC title
Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.