Two-stage front end for extent map database

US10353884B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10353884-B2
Application numberUS-201715601388-A
CountryUS
Kind codeB2
Filing dateMay 22, 2017
Priority dateDec 3, 2014
Publication dateJul 16, 2019
Grant dateJul 16, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the EMAP database. Once full, the contents of the log store are sorted and moved to the on-disk hash store, which holds the updates for a transient period of time to enable batching of the updates. Once sufficient batching of the extent map entries are accumulated, those entries are sorted and moved to the EMAP database. Thereafter, the EMAP database can be scanned to find extent map entries having identical checksum bits to perform data deduplication.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: appending extent identifiers of extents of a storage system into a log; based upon a threshold amount of the log being populated with extent identifiers, sorting and moving the extent identifiers into a hash store until expiration of a time period; sorting the extent identifiers in the hash store based upon expiration of the time period; and moving the extent identifiers from the hash store into a mapping structure mapping extent identifiers to storage location identifiers of corresponding extents. 2. The method of claim 1 , further comprising prioritizing deduplication of extents in the storage system based on the mapping structure. 3. The method of claim 1 , wherein an extent identifier comprises a checksum generated from an extent and a counter, wherein the counter distinguishes matching checksums. 4. The method of claim 1 , wherein the log comprises a key-value store and the hash store comprises a second key-value store, wherein the extent identifiers are the keys and storage location identifiers are the values. 5. The method of claim 1 , wherein the storage location identifiers are physical volume block numbers. 6. The method of claim 1 , wherein the appending comprises appending extent entries into the log, wherein an extent entry comprises an extent identifier and a corresponding storage location identifier. 7. The method of claim 6 , wherein moving the extent identifier from the hash store comprises: based on a determination that a node of the mapping structure includes the extent entry with a checksum of the extent identifier of the extent entry, inserting the extent entry from the hash store into the node and updating a score for the node based on a number of the extent entries in the node with a same checksum. 8. The method of claim 7 , comprising marking a dirty bit for a second node based on a determination that the second node has a second extent entry with a second checksum matching the checksum of the extent entry being inserted into the node. 9. The method of claim 1 , further comprising searching the log with a partial cuckoo hash of an extent based on a write operation to the storage system and performing deduplication based on finding a match. 10. A non-transitory computer-readable medium comprising instructions for performing a method, which when executed by a machine, causes the machine to: append extent identifiers of extents of a storage system into a first structure; based upon a threshold amount of the log being populated with extent identifiers, sort and move the extent identifiers into a second structure until expiration of a time period; sort the extent identifiers in the second structure based upon expiration of the time period; and update an extent map database with the extent identifiers from the second structure. 11. The non-transitory computer-readable medium of claim 10 , wherein the instructions cause the machine to prioritize deduplication of extents in the storage system based on the extent map database. 12. The non-transitory computer-readable medium of claim 10 , wherein an extent identifier comprises a checksum generated from an extent and a counter, wherein the counter distinguishes matching checksums. 13. The non-transitory computer-readable medium of claim 10 , wherein the first structure comprises a key-value store and the second structure comprises a second key-value store, wherein the extent identifiers are the keys and storage location identifiers are the values. 14. The non-transitory computer-readable medium of claim 13 , wherein the storage location identifiers are physical volume block numbers. 15. The non-transitory computer-readable medium of claim 10 , wherein the instructions cause the machine to: append extent entries into the first structure, wherein an extent entry comprises an extent identifier and a corresponding storage location identifier. 16. The non-transitory computer-readable medium of claim 15 , wherein the instructions cause the machine to: based on a determination that a node of an extent map database does not include the extent entry with a checksum of the extent identifier, inserting the extent entry from the first structure into a node of the extent map database. 17. The non-transitory computer-readable medium of claim 16 , wherein the instructions cause the machine to mark a dirty bit for a second node based on a determination that the second node has a second extent entry with a second checksum matching the checksum of the extent entry being inserted into the node. 18. The non-transitory computer-readable medium of claim 10 , wherein the instructions cause the machine to search the first structure with a partial cuckoo hash of an extent based on a write operation to the storage system and to perform deduplication based on finding a match. 19. A computing device comprising: a processor; a computer-readable medium comprising instructions executable by the processor to cause the processor to: append extent identifiers of extents of a storage system into a first structure; based upon a threshold amount of the log being populated with extent identifiers, sort and move the extent identifiers into a second structure until expiration of a time period; sort the extent identifiers in the second structure based upon expiration of the time period; and update an extent map database with the extent identifiers from the second structure. 20. The computing device of claim 19 , wherein the instructions cause the processor to search the second structure with a partial cuckoo hash of an extent based on a write operation to the storage system and to perform deduplication based on finding a match.

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

  • G06F16/137Primary

    Hash-based (content-based indexing of textual data G06F16/31) · CPC title

  • Saving storage space on storage systems · CPC title

  • Trees, e.g. B+trees · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10353884B2 cover?
Multiple key-value stores may be employed to smooth out random updates (based on the extent ID) to the EMAP database. The updates to the EMAP database occur in a two-stage manner: (i) using an append-only log store for the first stage and (ii) using an on-disk hash store for the second stage. The append-only log store is used to convert the random updates to sequential write operations on the E…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/137. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 16 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).