Systems and methods for data backup using data binning and deduplication

US10678654B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10678654-B2
Application numberUS-201715793188-A
CountryUS
Kind codeB2
Filing dateOct 25, 2017
Priority dateOct 26, 2016
Publication dateJun 9, 2020
Grant dateJun 9, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are methods and systems for performing data backup which implement data binning using log-structured merge (LSM) trees during deduplication. An exemplary method includes: calculating a reduced hash value (RHV) associated with each of a plurality of data blocks; partitioning the plurality of reduced hash values into groups; selecting a representative hash value for each group; determining whether the representative hash value occurs in a first LSM tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: loading the RHVs in the representative hash value's group into volatile memory; comparing each of the RHVs to one or more hash values in a second LSM tree to identify a matching hash value; and writing a segment identifier (ID) corresponding to the matching hash value in an archive, which references a data block in a segment store.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for data backup, comprising: calculating, by a processor, a reduced hash value associated with each of a plurality of data blocks; partitioning, by the processor, the plurality of reduced hash values into groups; selecting, by the processor, a representative hash value for each group; determining, by the processor, whether the representative hash value occurs in a first log-structured merge (LSM) tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: loading the reduced hash values in the representative hash value's group into the volatile memory; comparing each of the reduced hash values to one or more hash values in a second LSM tree to identify a matching hash value; and writing a segment identifier (ID) corresponding to the matching hash value in an archive, the segment ID referencing a data block in a segment store. 2. The method of claim 1 , wherein the representative hash value for each group is selected from the plurality of reduced hash values assigned to the group. 3. The method of claim 1 , wherein the segment ID is generated by incrementing a previous segment ID. 4. The method of claim 1 , further comprising: identifying the segment ID based on the determination that the representative hash value occurs in the first LSM tree, wherein the one or more reduced hash values in the second LSM tree are each associated with the segment ID. 5. The method of claim 1 , wherein calculating the reduced hash value comprises selecting a plurality of bits at a beginning of a first hash value as the reduced hash value. 6. The method of claim 1 , wherein the second LSM tree is stored in non-volatile memory. 7. The method of claim 1 , further comprising: when the representative hash value does not occur in the first LSM tree, storing each of the plurality of data blocks in the representative hash value's group in the segment store. 8. A system for data backup, comprising: one or more hardware or virtual processors configured to: calculate, for each of a plurality of data blocks, a reduced hash value associated with each of the data blocks; partition, by the processor, the plurality of reduced hash values into groups; select, by the processor, a representative hash value for each group; determine, by the processor, whether the representative hash value occurs in a first log-structured merge (LSM) tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: load the reduced hash values in the representative hash value's group into the volatile memory; compare each of the reduced hash values to one or more hash values in a second LSM tree to identify a matching hash value; and write a segment identifier (ID) corresponding to the matching hash value in an archive, the segment ID referencing a data block in a segment store. 9. The system of claim 8 , wherein the representative hash value for each group is selected from the plurality of reduced hash values assigned to the group. 10. The system of claim 8 , wherein the segment ID is generated by incrementing a previous segment ID. 11. The system of claim 8 , wherein the processor is further configured to: identify the segment ID based on the determination that the representative hash value occurs in the first LSM tree, wherein the one or more reduced hash values in the second LSM tree are each associated with the segment ID. 12. The system of claim 8 , wherein calculating the reduced hash value comprises selecting a plurality of bits at a beginning of a first hash value as the reduced hash value. 13. The system of claim 8 , wherein the second LSM tree is stored in non-volatile memory. 14. The system of claim 8 , wherein the processor is further configured to: when the representative hash value does not occur in the first LSM tree, store each of the plurality of data blocks in the representative hash value's group in the segment store. 15. A non-transitory computer readable medium comprising computer executable instructions for data backup, including instructions for: calculating, for each of a plurality of data blocks, a reduced hash value associated with each of the data blocks; partitioning, by the processor, the plurality of reduced hash values into groups; selecting, by the processor, a representative hash value for each group; determining, by the processor, whether the representative hash value occurs in a first log-structured merge (LSM) tree, the first LSM tree stored in a volatile memory; and when the representative hash value occurs in the first LSM tree: loading the reduced hash values in the representative hash value's group into the volatile memory; comparing each of the reduced hash values to one or more hash values in a second LSM tree to identify a matching hash value; and writing a segment identifier (ID) corresponding to the matching hash value in an archive, the segment ID referencing a data block in a segment store. 16. The non-transitory computer readable medium of claim 15 , wherein the representative hash value for each group is selected from the plurality of reduced hash values assigned to the group. 17. The non-transitory computer readable medium of claim 15 , wherein the segment ID is generated by incrementing a previous segment ID. 18. The non-transitory computer readable medium of claim 15 , further comprising instructions for: identifying the segment ID based on the determination that the representative hash value occurs in the first LSM tree, wherein the one or more reduced hash values in the second LSM tree are each associated with the segment ID. 19. The non-transitory computer readable medium of claim 15 , wherein calculating the reduced hash value comprises selecting a plurality of bits at a beginning of a first hash value as the reduced hash value. 20. The non-transitory computer readable medium of claim 15 , wherein the second LSM tree is stored in non-volatile memory.

Assignees

Inventors

Classifications

  • Indexing; Data structures therefor; Storage structures · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • hash tables · CPC title

  • using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10678654B2 cover?
Disclosed are methods and systems for performing data backup which implement data binning using log-structured merge (LSM) trees during deduplication. An exemplary method includes: calculating a reduced hash value (RHV) associated with each of a plurality of data blocks; partitioning the plurality of reduced hash values into groups; selecting a representative hash value for each group; determin…
Who is the assignee on this patent?
Acronis Int Gmbh
What technology area does this patent fall under?
Primary CPC classification G06F16/9014. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 09 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).