Method and system for determining data integrity for garbage collection of data storage systems

US9367448B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9367448-B1
Application numberUS-201313909875-A
CountryUS
Kind codeB1
Filing dateJun 4, 2013
Priority dateJun 4, 2013
Publication dateJun 14, 2016
Grant dateJun 14, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A garbage collector of a storage system traverses a namespace of a file system of the storage system to verify data integrity of segments. The namespace identifies files that are represented by segments arranged in multiple levels in a hierarchy, where an upper level segment includes one or more references to one or more lower level segments, and at least one segment is referenced by multiple files. Traversing the namespace includes computing and verifying checksums all segments in a level-by-level manner, where checksums of an upper level are verified before any of checksums of a lower level are verified. Upon all checksums of all levels have been verified, a garbage collection process is performed on the segments stored in the storage system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of verifying data integrity for garbage collection, the method comprising: traversing, by a garbage collector executed by a processor, a namespace of a file system of a storage system to verify data integrity of segments, the namespace identifying a plurality of files that are represented by a plurality of segments arranged in a plurality of levels in a hierarchy, wherein an upper level segment includes one or more references to one or more lower level segments, wherein at least one segment is referenced by multiple files, wherein traversing the namespace includes verifying data integrity for all segments in a level-by-level manner comprising: verifying data integrity from checksums for an upper level segment, and verifying, in response to data integrity verification of the upper level segment, data integrity of a lower level segment from a lower level segment checksum, for each of the levels in the hierarchy, iteratively performing the following: obtaining fingerprints of all segments of a current level, computing checksums from the fingerprints of all segments of the current level, and adding the checksums of the current level to a parent checksum of the current level, computing checksums from the fingerprints of the current level that are read from the storage device, adding the checksums of the current level to a child checksum of the current level, for each of the fingerprints of the current level, marking a corresponding bit of a walk vector to indicate that a corresponding segment has been processed, wherein the walk vector includes a plurality of bits, each bit corresponding to one of the segments in the namespace, for each of the segments of the current level, retrieving a fingerprint of the segment from its storage location of a storage device, and marking a corresponding bit of a read vector to indicate that the segment has been read from its storage location; and upon verifying data integrity for the plurality of levels, performing a garbage collection process to reclaim storage from segments not referenced by a file in the storage system. 2. The method of claim 1 , further comprising: retrieving a data portion of each segment of the current level from the storage device; identifying one or more child segments of a child level with respect to the current level; and performing checksum verification on the child level after all segments of the current level have been processed. 3. The method of claim 2 , further comprising: for each checksum added to a parent checksum of each level, incrementing a parent counter associated with the corresponding level; and for each checksum added to a child checksum of each level, incrementing a child counter associated with the corresponding level. 4. The method of claim 3 , further comprising: after all bits of the walk vector have been populated, comparing a parent checksum with a child checksum of each level to determine if they match; comparing a parent counter and a child counter of each level to determine if they match; and performing the garbage collection process if all parent checksums and child checksums match and all parent counters and child counters match. 5. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations of verifying data integrity for garbage collection, the operations comprising: traversing, by a garbage collector executed by a processor, a namespace of a file system of a storage system to verify data integrity of segments, the namespace identifying a plurality of files that are represented by a plurality of segments arranged in a plurality of levels in a hierarchy, wherein an upper level segment includes one or more references to one or more lower level segments, wherein at least one segment is referenced by multiple files, wherein traversing the namespace includes verifying data integrity for all segments in a level-by-level manner comprising: verifying data integrity from checksums for an upper level segment, and verifying, in response to data integrity verification of the upper level segment, data integrity of a lower level segment from a lower level segment checksum, for each of the levels in the hierarchy, iteratively performing the following: obtaining fingerprints of all segments of a current level, computing checksums from the fingerprints of all segments of the current level, and adding the checksums of the current level to a parent checksum of the current level, computing checksums from the fingerprints of the current level that are read from the storage device, adding the checksums of the current level to a child checksum of the current level, for each of the fingerprints of the current level, marking a corresponding bit of a walk vector to indicate that a corresponding segment has been processed, wherein the walk vector includes a plurality of bits, each bit corresponding to one of the segments in the namespace, for each of the segments of the current level, retrieving a fingerprint of the segment from its storage location of a storage device, and marking a corresponding bit of a read vector to indicate that the segment has been read from its storage location; and upon verifying data integrity for the plurality of levels, performing a garbage collection process to reclaim storage from segments not referenced by a file in the storage system. 6. The medium of claim 5 , wherein the operations further comprise: retrieving a data portion of each segment of the current level from the storage device; identifying one or more child segments of a child level with respect to the current level; and performing checksum verification on the child level after all segments of the current level have been processed. 7. The medium of claim 6 , wherein the operations further comprise: for each checksum added to a parent checksum of each level, incrementing a parent counter associated with the corresponding level; and for each checksum added to a child checksum of each level, incrementing a child counter associated with the corresponding level. 8. The medium of claim 7 , wherein the operations further comprise: after all bits of the walk vector have been populated, comparing a parent checksum with a child checksum of each level to determine if they match; comparing a parent counter and a child counter of each level to determine if they match; and performing the garbage collection process if all parent checksums and child checksums match and all parent counters and child counters match. 9. A data storage system, comprising: a processor; and a memory coupled to the processor for storing instructions which when executed from the memory, cause the processor to perform operations, the operations including traversing, by a garbage collector executed by a processor, a namespace of a file system of a storage system to verify data integrity of segments, the namespace identifying a plurality of files that are represented by a plurality of segments arranged in a plurality of levels in a hierarchy, wherein an upper level segment includes one or more references to one or more lower level segments, wherein at least one segment is referenced by multiple files, wherein traversing the namespace includes verifying data integrity for all segments in a level-by-level manner comprising: verifying data integrity from checksums for an upper level segment, and verifying, in response to data integrity verification of the upper level segment, data integrity of a lower level segment from a lower level segment checksum, for each of the levels in the hierarchy, iteratively performing the fol

Assignees

Inventors

Classifications

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

  • Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket · CPC title

  • Disk arrays, e.g. RAID, JBOD · CPC title

  • De-duplication techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9367448B1 cover?
A garbage collector of a storage system traverses a namespace of a file system of the storage system to verify data integrity of segments. The namespace identifies files that are represented by segments arranged in multiple levels in a hierarchy, where an upper level segment includes one or more references to one or more lower level segments, and at least one segment is referenced by multiple f…
Who is the assignee on this patent?
Emc Corp
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).