Virtual chunk service based data recovery in a distributed data storage system

US9921910B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9921910-B2
Application numberUS-201514696001-A
CountryUS
Kind codeB2
Filing dateApr 24, 2015
Priority dateFeb 19, 2015
Publication dateMar 20, 2018
Grant dateMar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technology is disclosed for storing data in a distributed storage system using a virtual chunk service (VCS). In the VCS based storage technique, a storage node (“node”) is split into multiple VCSs and each of the VCSs can be assigned a unique ID in the distributed storage. A set of VCSs from a set of nodes form a storage group, which also can be assigned a unique ID in the distributed storage. When a data object is received for storage, a storage group is identified for the data object, the data object is encoded to generate multiple fragments and each fragment is stored in a VCS of the identified storage group. The data recovery process is made more efficient by using metadata, e.g., VCS to storage node mapping, storage group to VCS mapping, VCS to objects mapping, which eliminates resource intensive read and write operations during recovery.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method comprising: determining that a first data storage node of a distributed storage system having multiple data storage nodes has failed or is about to fail; based on determining that the first data storage node has failed or is about to fail, identifying a first set of virtual chunk spaces assigned to the first data storage node based on first metadata that indicate mappings between data storage nodes and virtual chunk spaces, wherein a plurality of virtual chunk spaces across the distributed storage system includes the first set of virtual chunk spaces; reassigning the first set of virtual chunk spaces to a second data storage node of the distributed storage system; identifying a first set of data objects of which fragments had been stored to the first set of virtual chunk spaces; and after reassignment of the first set of virtual chunk spaces to the second data storage node and after regeneration of a first set of fragments of the first set of data objects that were stored to the first set of virtual chunk spaces when assigned to the first data storage node, storing the first set of fragments to the first set of virtual chunk spaces. 2. The computer-implemented method of claim 1 , wherein reassigning the first set of virtual chunk spaces to the second data storage node comprises: updating the mappings in the first metadata to indicate that the first set of virtual chunk spaces is assigned to the second data storage node instead of the first data storage node. 3. The computer-implemented method of claim 1 further comprising: identifying a first set of erasure coding groups associated with the first set of virtual chunk spaces based on second metadata that indicate associations of erasure coding groups with virtual chunk spaces, wherein the first data storage node is assigned to no more than one of the first set of virtual chunk spaces from each of the first set of erasure coding groups, wherein identifying the first set of data objects of which fragments have been stored to the first set of virtual chunk spaces is based on third metadata that indicates mappings of data objects to erasure coding groups. 4. The computer-implemented method of claim 3 further comprising: based on a client request for retrieving a first data object of the first set of data objects from the distributed storage system, the request including a storage group identification (ID) of a first of the first set of erasure coding groups with which the first data object is associated: identifying a second set of the virtual chunk spaces associated with the first erasure coding group according to the second metadata and retrieving fragments of the first data object from the second set of virtual chunk spaces, wherein the second set of virtual chunk spaces includes only one of the first set of virtual chunk spaces; and returning the first data object to the requesting client after generating the first data object from the retrieved fragments. 5. The computer-implemented method of claim 1 , wherein identifying the first set of data objects of which fragments have been stored to the first set of virtual chunk spaces is based on third metadata that indicates mappings of virtual chunk spaces to data objects. 6. A non-transitory computer-readable storage medium having computer-executable instructions, comprising instructions to: determine that a first data storage node of a distributed storage system having multiple data storage nodes has failed or is about to fail; based on a determination that the first data storage node has failed or is about to fail, identify a first set of virtual chunk spaces assigned to the first data storage node based on first metadata that indicate mappings between data storage nodes and virtual chunk spaces, wherein a plurality of virtual chunk spaces across the distributed storage system includes the first set of virtual chunk spaces; reassign the first set of virtual chunk spaces to a second data storage node of the distributed storage system; identify a first set of data objects of which fragments had been stored to the first set of virtual chunk spaces; and after reassignment of the first set of virtual chunk spaces to the second data storage node and after regeneration of a first set of fragments of the first set of data objects that were stored to the first set of virtual chunk spaces when assigned to the first data storage node, store the first set of fragments to the first set of virtual chunk spaces. 7. The non-transitory computer-readable storage medium of claim 6 , wherein the instructions to reassign the first set of virtual chunk spaces to the second data storage node comprise instructions to: update the mappings in the first metadata to indicate that the first set of virtual chunk spaces is assigned to the second data storage node instead of the first data storage node. 8. The non-transitory computer-readable storage medium of claim 6 , wherein the computer-executable instructions further comprise instructions to: identify a first set of erasure coding groups associated with the first set of virtual chunk spaces based on second metadata that indicate associations of erasure coding groups with virtual chunk spaces, wherein the first data storage node is assigned to no more than one of the first set of virtual chunk spaces from each of the first set of erasure coding groups, wherein the instructions to identify the first set of data objects of which fragments have been stored to the first set of virtual chunk spaces comprise instructions to identify the first set of data objects based on third metadata that indicate mappings of data objects to erasure coding groups. 9. The non-transitory computer-readable storage medium of claim 8 , wherein the computer-executable instructions further comprise instructions to: based on a request from a client computer for retrieving a first data object of the first set of data objects from the distributed storage system, the request including a storage group identification (ID) of a first of the first set of erasure coding groups with which the first data object is associated: identify a second set of virtual chunk spaces associated with the first erasure coding group according to the second metadata and retrieve fragments of the first data object from the second set of virtual chunk spaces, wherein the second set of virtual chunk spaces includes only one of the first set of virtual chunk spaces; and return the first data object to the client computer after generation of the first data object from the retrieved fragments. 10. The non-transitory computer-readable storage medium of claim 6 , wherein the instructions to identify the first set of data objects of which fragments have been stored to the first set of virtual chunk spaces comprise instructions to identify the first set of data objects based on third metadata that indicate mappings of virtual chunk spaces to data objects. 11. A system comprising: a processor; a non-transitory computer-readable medium comprising instructions executable by the processor to cause the system to, determine that a first data storage node of a distributed storage system having multiple data storage nodes has failed or is about to fail; based on a determination that the first data storage node has failed or is about to fail, identify a first set of virtual chunk spaces assigned to the first data storage node based on first metadata that indicate mappings between data storage nodes and virtual chunk spaces, wherein a plurality of virtual chunk spaces across the distributed storage system includes the first set of virtual chunk spaces; reassign the first se

Assignees

Inventors

Classifications

  • involving logging of persistent data for recovery · CPC title

  • Saving, restoring, recovering or retrying · CPC title

  • Parity data used in redundant arrays of independent storages, e.g. in RAID systems · CPC title

  • using recovery blocks · CPC title

  • Errors handling and recovery, e.g. reprinting (G06F3/1261 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9921910B2 cover?
Technology is disclosed for storing data in a distributed storage system using a virtual chunk service (VCS). In the VCS based storage technique, a storage node (“node”) is split into multiple VCSs and each of the VCSs can be assigned a unique ID in the distributed storage. A set of VCSs from a set of nodes form a storage group, which also can be assigned a unique ID in the distributed storage.…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/1076. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).