Method and apparatus for content derived data placement in memory
US-9875183-B2 · Jan 23, 2018 · US
US10678654B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10678654-B2 |
| Application number | US-201715793188-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 25, 2017 |
| Priority date | Oct 26, 2016 |
| Publication date | Jun 9, 2020 |
| Grant date | Jun 9, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are methods and systems for performing data backup which implement data binning using log-structured merge (LSM) trees during deduplication. An exemplary method includes: calculating a reduced hash value (RHV) associated with each of a plurality of data blocks; partitioning the plurality of reduced hash values into groups; selecting a representative hash value for each group; determining whether the representative hash value occurs in a first LSM tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: loading the RHVs in the representative hash value's group into volatile memory; comparing each of the RHVs to one or more hash values in a second LSM tree to identify a matching hash value; and writing a segment identifier (ID) corresponding to the matching hash value in an archive, which references a data block in a segment store.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for data backup, comprising: calculating, by a processor, a reduced hash value associated with each of a plurality of data blocks; partitioning, by the processor, the plurality of reduced hash values into groups; selecting, by the processor, a representative hash value for each group; determining, by the processor, whether the representative hash value occurs in a first log-structured merge (LSM) tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: loading the reduced hash values in the representative hash value's group into the volatile memory; comparing each of the reduced hash values to one or more hash values in a second LSM tree to identify a matching hash value; and writing a segment identifier (ID) corresponding to the matching hash value in an archive, the segment ID referencing a data block in a segment store. 2. The method of claim 1 , wherein the representative hash value for each group is selected from the plurality of reduced hash values assigned to the group. 3. The method of claim 1 , wherein the segment ID is generated by incrementing a previous segment ID. 4. The method of claim 1 , further comprising: identifying the segment ID based on the determination that the representative hash value occurs in the first LSM tree, wherein the one or more reduced hash values in the second LSM tree are each associated with the segment ID. 5. The method of claim 1 , wherein calculating the reduced hash value comprises selecting a plurality of bits at a beginning of a first hash value as the reduced hash value. 6. The method of claim 1 , wherein the second LSM tree is stored in non-volatile memory. 7. The method of claim 1 , further comprising: when the representative hash value does not occur in the first LSM tree, storing each of the plurality of data blocks in the representative hash value's group in the segment store. 8. A system for data backup, comprising: one or more hardware or virtual processors configured to: calculate, for each of a plurality of data blocks, a reduced hash value associated with each of the data blocks; partition, by the processor, the plurality of reduced hash values into groups; select, by the processor, a representative hash value for each group; determine, by the processor, whether the representative hash value occurs in a first log-structured merge (LSM) tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: load the reduced hash values in the representative hash value's group into the volatile memory; compare each of the reduced hash values to one or more hash values in a second LSM tree to identify a matching hash value; and write a segment identifier (ID) corresponding to the matching hash value in an archive, the segment ID referencing a data block in a segment store. 9. The system of claim 8 , wherein the representative hash value for each group is selected from the plurality of reduced hash values assigned to the group. 10. The system of claim 8 , wherein the segment ID is generated by incrementing a previous segment ID. 11. The system of claim 8 , wherein the processor is further configured to: identify the segment ID based on the determination that the representative hash value occurs in the first LSM tree, wherein the one or more reduced hash values in the second LSM tree are each associated with the segment ID. 12. The system of claim 8 , wherein calculating the reduced hash value comprises selecting a plurality of bits at a beginning of a first hash value as the reduced hash value. 13. The system of claim 8 , wherein the second LSM tree is stored in non-volatile memory. 14. The system of claim 8 , wherein the processor is further configured to: when the representative hash value does not occur in the first LSM tree, store each of the plurality of data blocks in the representative hash value's group in the segment store. 15. A non-transitory computer readable medium comprising computer executable instructions for data backup, including instructions for: calculating, for each of a plurality of data blocks, a reduced hash value associated with each of the data blocks; partitioning, by the processor, the plurality of reduced hash values into groups; selecting, by the processor, a representative hash value for each group; determining, by the processor, whether the representative hash value occurs in a first log-structured merge (LSM) tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: loading the reduced hash values in the representative hash value's group into the volatile memory; comparing each of the reduced hash values to one or more hash values in a second LSM tree to identify a matching hash value; and writing a segment identifier (ID) corresponding to the matching hash value in an archive, the segment ID referencing a data block in a segment store. 16. The non-transitory computer readable medium of claim 15 , wherein the representative hash value for each group is selected from the plurality of reduced hash values assigned to the group. 17. The non-transitory computer readable medium of claim 15 , wherein the segment ID is generated by incrementing a previous segment ID. 18. The non-transitory computer readable medium of claim 15 , further comprising instructions for: identifying the segment ID based on the determination that the representative hash value occurs in the first LSM tree, wherein the one or more reduced hash values in the second LSM tree are each associated with the segment ID. 19. The non-transitory computer readable medium of claim 15 , wherein calculating the reduced hash value comprises selecting a plurality of bits at a beginning of a first hash value as the reduced hash value. 20. The non-transitory computer readable medium of claim 15 , wherein the second LSM tree is stored in non-volatile memory.
Indexing; Data structures therefor; Storage structures · CPC title
Using snapshots, i.e. a logical point-in-time copy of the data · CPC title
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
hash tables · CPC title
using de-duplication of the data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.