System performing data deduplication using a dense tree data structure

US9798728B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9798728-B2
Application numberUS-201414339890-A
CountryUS
Kind codeB2
Filing dateJul 24, 2014
Priority dateJul 24, 2014
Publication dateOct 24, 2017
Grant dateOct 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, as new blocks of data are written to storage devices of a storage system, fingerprints are generated for those new blocks and inserted as entries into a top level (L0) of a dense tree data structure. When L0 is filled, the contents from L0 may be merged with level 1 (L1). After the initial merge, new fingerprints are added to L0 until L0 fills up again, which triggers a new merge. Duplicate fingerprints in L0 and L1 are identified which, in turn, indicates duplicate data blocks. A post-processing deduplication operation is then performed to remove duplicate data blocks corresponding to the duplicate fingerprints. In a different embodiment, as new fingerprint entries are loaded into L0, those new fingerprints may be compared with existing fingerprints loaded into L0 and/or other levels to facilitate inline deduplication to identify duplicate fingerprints and subsequently perform the deduplication operation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: generating, by a processor, a fingerprint identifying data to be stored on one or more storage devices; inserting the fingerprint into a first level of a dense tree data structure having a plurality of levels; initiating a merge operation between the first level and a second level of the dense tree data structure based on the first level of the dense tree being filled to a threshold capacity, wherein the first level of the dense tree stores a first set of fingerprints and the second level stores a second set of fingerprints; and in response to initiating the merge operation, comparing the first set of fingerprints stored in the first level with the second set of fingerprints stored in the second level of the dense tree data structure to identify one or more duplicate fingerprints and performing data deduplication for selected data, corresponding to the one or more identified duplicate fingerprints, stored on the one or more storage devices. 2. The method of claim 1 , wherein the first level is level 0 of the dense tree data structure. 3. The method of claim 1 , further comprising: generating a different fingerprint; comparing the different fingerprint with other fingerprints stored in a third level of the dense tree data structure to identify another duplicate fingerprint; and in response to identifying the another duplicate fingerprint in the dense tree data structure, performing data deduplication for identified data corresponding to the identified another duplicate fingerprint. 4. The method of claim 1 wherein a capacity of the second level of the dense tree data structure is greater than a capacity of the first level of the dense tree data structure. 5. The method of claim 1 wherein the dense tree data structure is a B+ tree. 6. A method comprising: generating, by a processor, a fingerprint identifying data to be stored on one or more storage devices; comparing the fingerprint with other fingerprints stored in a first level of a dense tree data structure having a plurality of levels, to identify a duplicate fingerprint; in response to identifying the duplicate fingerprint in the dense tree data structure, performing data deduplication associated with the identified data stored on the one or more storage devices, wherein the identified data corresponds to the identified duplicate fingerprint, and storing the fingerprint in the first level of the dense tree data structure; and in response to not identifying the duplicate fingerprint in the dense tree data structure, storing the fingerprint in the first level of the dense tree data structure. 7. The method of claim 6 , wherein the first level is level 0 of the dense tree data structure. 8. The method of claim 7 , wherein the level 0 of the dense tree data structure is stored in a memory coupled to the processor. 9. The method of claim 6 wherein the dense tree data structure is a B+ tree. 10. The method of claim 6 wherein the first level of the dense tree has a smaller storage capacity than each other level of the dense tree data structure. 11. The method of claim 6 , further comprising comparing the fingerprint with the other fingerprints stored in one or more other levels of the dense tree data structure to identify one or more other duplicate fingerprints and performing data deduplication for the one or more other identified duplicate fingerprints. 12. A system, comprising: a processor configured to execute one or more processes; and a memory configured to store a process executable by the processor, the process configured to: generate a fingerprint identifying data to be stored on one or more storage devices; insert the fingerprint into a first level of a dense tree data structure having a plurality of levels; initiate a merge operation between the first level and a second level of the dense tree data structure in response to the first level being filled to a threshold capacity, wherein the first level of the dense tree stores a first set of fingerprints and the second level stores a second set of fingerprints; compare the first set of fingerprints stored in the first level and with the second set of fingerprints stored in the second level of the dense tree data structure to identify one or more duplicate fingerprints in response to initiating the merge operation; and perform data deduplication for selected data, corresponding to the one or more identified duplicate fingerprints, stored on the one or more storage device. 13. The system of claim 12 , wherein the first level is level 0 of the dense tree data structure. 14. The system of claim 13 , wherein the level 0 of the dense tree data structure is stored in the memory. 15. The system of claim 14 , wherein the second level of the dense tree data structure is stored on the one or more storage devices. 16. The system of claim 12 wherein the fingerprint and an identification value associated with the data are stored in an entry of the dense tree data structure. 17. The system of claim 12 , wherein the processor, configured to perform data deduplication, is further configured to: update a pointer associated with the selected data to reference other data already stored on the one or more storage devices; and erase the selected data from the one or more storage devices. 18. The system of claim 12 , wherein the dense tree data structure is a B+ tree. 19. The system of claim 12 , wherein the first level of the dense tree has a smaller storage capacity than each other level of the dense tree data structure.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9798728B2 cover?
In one embodiment, as new blocks of data are written to storage devices of a storage system, fingerprints are generated for those new blocks and inserted as entries into a top level (L0) of a dense tree data structure. When L0 is filled, the contents from L0 may be merged with level 1 (L1). After the initial merge, new fingerprints are added to L0 until L0 fills up again, which triggers a new m…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/137. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).