Consistency checker for global de-duplication clustered file system

US10049118B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10049118-B2
Application numberUS-201514727005-A
CountryUS
Kind codeB2
Filing dateJun 1, 2015
Priority dateJun 1, 2015
Publication dateAug 14, 2018
Grant dateAug 14, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A cluster-wide consistency checker ensures that two file systems of a storage input/output (I/O) stack executing on each node of a cluster are self-consistent as well as consistent with respect to each other. The file systems include a deduplication file system and a host-facing file system that cooperate to provide a layered file system of the storage I/O stack. The deduplication file system is a log-structured file system managed by an extent store layer of the storage I/O stack, whereas the host-facing file system is managed by a volume layer of the stack. Illustratively, each log-structured file system implements a key-value store and cooperates with other nodes of the cluster to provide a cluster-wide (global) key-value store. The consistency checker verifies and/or fixes on-disk structures of the layered file system to ensure its consistency. To that end, the consistency checker may determine whether there are inconsistencies in the key-value store and, if so, reconciles those inconsistencies from a client (volume layer) perspective.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: checking self-consistency of first on-disk data structures of an extent store, in response to recovery of a layered file system, the extent store included as a first layer of the layered file system serviced by a storage system coupled to one or more storage devices storing extents of the extent store, the first on-disk data structures of the extent store including one or more hash tables configured to maintain an extent key and a reference count for each of the extents; checking self-consistency of second on-disk data structures of a volume layer in response to the recovery of the layered file system, the volume layer included as a second layer of the layered file system, the second on-disk structures of the volume layer including one or more dense tree structures configured to maintain mappings from host-accessible logical unit number (LUN) addresses to the extent keys; and reconciling, for each extent key in the volume layer mappings, the reference count of a same extent key maintained in the extent store. 2. The method of claim 1 wherein checking the self-consistency of the extent store further comprises repairing the reference count for a de-duplicated extent. 3. The method of claim 1 wherein checking the self-consistency of the extent store further comprises validating an addressable location for each extent of the extent store. 4. The method of claim 1 further comprising: publishing each extent key maintained in the volume layer mappings to a first set of map reducer bins, wherein each map reducer bin is a key-value store maintaining the extent key as a key of the key-value store and the reference count as a value of the key-value store. 5. The method of claim 4 further comprising: merging the first set of map reducer bins associated with a first host-accessible LUN with a second set of map reducer bins associated with a second host-accessible LUN; and reconciling for each extent key of a merged result of the first and second sets of map reducer bins, the reference count of the same extent key maintained in the extent store. 6. The method of claim 4 wherein publishing the extent key maintained in the volume layer to the first set of map reducer bins increases the reference count of the same extent key maintained in the key-value store of the map reducer bin. 7. The method of claim 4 further comprising: publishing to the first set of map reducer bins, a refcount log extent key maintained in a refcount log for unreferencing an extent of the extent store, wherein update to the extent store from the refcount log is deferred; and increasing the reference count of the extent key for the unreferenced extent maintained in the key-value store of the map reducer bin. 8. The method of claim 1 wherein the checked extent store is written to the storage devices. 9. The method of claim 4 wherein the mappings of the volume layer are maintained in a plurality of regions, and wherein each extent key maintained in the volume layer mappings per region is published in parallel with other regions. 10. The method of claim 1 wherein reconciling each extent key in the volume layer mappings is performed while the layered file system is offline. 11. A system comprising: a storage array coupled to a storage system, the storage system having a memory connected to a processor via a bus; a storage input/output (I/O) stack executing on the processor of the storage system, the storage I/O to stack configured to: check self-consistency of first on-disk data structures of an extent store, in response to recovery of a layered file system, the extent store included as a first layer of the layered file system executed by the storage I/O stack, the storage array storing extents of the extent store, the first on-disk data structures of the extent store configured to maintain an extent key and a reference count for each of the extents; check self-consistency of second on-disk data structures of a volume layer in response to the recovery of the layered file system, the volume layer included as a second layer of the layered file system, the second on-disk structures of to the volume layer including one or more dense tree structures configured to maintain mappings from host-accessible logical unit number (LUN) addresses to the extent keys; and reconcile, for each extent key in the volume layer mappings, the reference count of a same extent key maintained in the extent store. 12. The system of claim 11 wherein checking the self-consistency of the extent store further comprises repairing the reference count for a de-duplicated extent. 13. The system of claim 11 wherein checking the self-consistency of the extent store further comprises validating an addressable location for each extent of the extent store. 14. The system of claim 11 wherein the storage I/O stack is further configured to: publish each extent key maintained in the volume layer mappings to a first set of map reducer bins, wherein each map reducer bin is a key-value store maintaining the extent key as a key of the key-value store and the reference count as a value of the key-value store. 15. The system of claim 14 wherein the storage I/O stack is further configured to: merge the first set of map reducer bins associated with a first host-accessible LUN with a second set of map reducer bins associated with a second host-accessible LUN; and reconcile, for each extent key of a merged result of the first and second sets of map reducer bins, the reference count of the same extent key maintained in the extent store. 16. The system of claim 14 wherein publishing the extent key maintained in the volume layer to the first set of map reducer bins increases the reference count of the same extent key maintained in the key-value store of the map reducer bin. 17. The system of claim 14 wherein the storage I/O stack is further configured to: publish to the first set of map reducer bins, a refcount log extent key maintained in a refcount log for unreferencing an extent of the extent store, wherein update to the extent store from the refcount log is deferred; and increase the reference count of an extent key for the unreferenced extent maintained in the key-value store of the map reducer bin. 18. The system of claim 11 wherein the checked extent store is written to the storage array. 19. The system of claim 14 wherein the mappings of the volume layer are maintained in a plurality of regions, and wherein each extent key maintained in the volume layer mappings per region is published in parallel with other regions. 20. A non-transitory computer readable storage medium containing executable program instructions for execution by a processor included in a storage system having a memory, the storage system coupled to one or more storage devices, comprising program instructions that: check self-consistency of first on-disk data structures of an extent store, in response to recovery of a layered file system, the extent store included as a first layer of the layered file system, the one or more storage devices storing extents of the extent store, the first on-disk data structures of the extent store including one or more hash tables configured to maintain an extent key and a reference count for each of the extents; check self-consistency of second on-disk data structures of a volume layer in response to the recovery of the layered file system, the volume layer included as a second layer of the layered file system, the second on-disk structu

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10049118B2 cover?
A cluster-wide consistency checker ensures that two file systems of a storage input/output (I/O) stack executing on each node of a cluster are self-consistent as well as consistent with respect to each other. The file systems include a deduplication file system and a host-facing file system that cooperate to provide a layered file system of the storage I/O stack. The deduplication file system i…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).