Physical file verification

US10810162B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10810162-B2
Application numberUS-201816034282-A
CountryUS
Kind codeB2
Filing dateJul 12, 2018
Priority dateJul 12, 2018
Publication dateOct 20, 2020
Grant dateOct 20, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A perfect hash vector (PHVEC) is created to track segments in a deduplication file system. Files are represented by segment trees having hierarchical segment levels. Containers store the segments and fingerprints of segments. Upper-level segments are traversed to identify a first set of fingerprints of each level. These fingerprints correspond to segments that should be present. The first set of fingerprints are hashed and bits are set in the PHVEC corresponding to positions from the hashing. The containers are read to identify a second set of fingerprints. These fingerprints correspond to segments that are present. The second set of fingerprints are hashed and bits are cleared in the PHVEC corresponding to positions from the hashing. If a bit was set and not cleared, a determination is that there is at least one segment missing. If all bits set were also cleared, a determination is that no segments are missing.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a processor; and memory configured to store one or more sequences of instructions which, when executed by the processor, cause the processor to carry out the steps of: creating a perfect hash vector to track segments in a deduplication file system comprising files, segment trees, and containers, the files being represented by the segment trees, the segment trees having multiple segment levels arranged in a hierarchy, and the containers storing the segments, and fingerprints corresponding to the segments; traversing upper-level segments to identify a first set of fingerprints of each level of the segment trees, the first set of fingerprints corresponding to segments that should be present; hashing the first set of fingerprints; setting bits in the perfect hash vector corresponding to positions calculated from hashing the first set of fingerprints; reading the containers to identify a second set of fingerprints stored in the containers, the second set of fingerprints corresponding to segments that are present; hashing the second set of fingerprints; clearing bits in the perfect hash vector corresponding to positions calculated from hashing the second set of fingerprints; reviewing the perfect hash vector to determine whether there are any bits in the perfect hash vector that were set and not cleared; if a bit was set and not cleared, determining that at least one segment is missing from the deduplication file system; and if all bits set were also cleared, determining that no segments are missing from the deduplication file system. 2. The system of claim 1 wherein the processor further carries out the steps of: retrieving the fingerprints from an index mapping the fingerprints to the containers in which the segments are stored, each fingerprint being listed in the index having a corresponding segment stored in a container; and seeding a perfect hash function with the fingerprints retrieved from the index to create the perfect hash vector. 3. The system of claim 1 wherein the processor further carries out the steps of: retrieving the fingerprints from an index mapping the fingerprints to the containers in which the segments are stored, each fingerprint being listed in the index having a corresponding segment stored in a container; seeding a perfect hash function with the fingerprints retrieved from the index to create the perfect hash vector; and sizing the perfect hash vector with a number of bits that is substantially greater than a number of fingerprints in the index to decrease a probability of a collision in the perfect hash vector between a fingerprint of a segment that is present and a fingerprint of a segment that is not present but should be present. 4. The system of claim 1 wherein the processor further carries out the steps of: requesting that all available memory be allocated to the perfect hash vector, the perfect hash vector thereby comprising a number of bits that is substantially greater than a count of the fingerprints. 5. The system of claim 1 wherein the processor further carries out the steps of: reading containers storing segments at upper levels of the segment trees; and based on the reading of containers storing segments at the upper levels, identifying fingerprints of the segments at the upper levels, and fingerprints of segments at a lowest level of the segment tree that are referenced by segments at a last upper level of the segment trees. 6. The system of claim 1 wherein the perfect hash vector does not store the fingerprints. 7. A method comprising: creating a perfect hash vector to track segments in a deduplication file system comprising files, segment trees, and containers, the files being represented by the segment trees, the segment trees having multiple segment levels arranged in a hierarchy, and the containers storing the segments, and fingerprints corresponding to the segments; traversing upper-level segments to identify a first set of fingerprints of each level of the segment trees, the first set of fingerprints corresponding to segments that should be present; hashing the first set of fingerprints; setting bits in the perfect hash vector corresponding to positions calculated from hashing the first set of fingerprints; reading the containers to identify a second set of fingerprints stored in the containers, the second set of fingerprints corresponding to segments that are present; hashing the second set of fingerprints; clearing bits in the perfect hash vector corresponding to positions calculated from hashing the second set of fingerprints; reviewing the perfect hash vector to determine whether there are any bits in the perfect hash vector that were set and not cleared; if a bit was set and not cleared, determining that at least one segment is missing from the deduplication file system; and if all bits set were also cleared, determining that no segments are missing from the deduplication file system. 8. The method of claim 7 comprising: retrieving the fingerprints from an index mapping the fingerprints to the containers in which the segments are stored, each fingerprint being listed in the index having a corresponding segment stored in a container; seeding a perfect hash function with the fingerprints retrieved from the index to create the perfect hash vector; and sizing the perfect hash vector with a number of bits that is substantially greater than a number of fingerprints in the index, wherein the sizing increases a probability that a fingerprint of a missing segment will map to a bit position in the perfect hash vector that is not also mapped to by a fingerprint of a segment that is present. 9. The method of claim 7 comprising: retrieving the fingerprints from an index mapping the fingerprints to the containers in which the segments are stored, each fingerprint being listed in the index having a corresponding segment stored in a container; seeding a perfect hash function with the fingerprints retrieved from the index to create the perfect hash vector; and sizing the perfect hash vector with a number of bits that is substantially greater than a number of fingerprints in the index to decrease a probability of a collision in the perfect hash vector between a fingerprint of a segment that is present and a fingerprint of a segment that is not present but should be present. 10. The method of claim 7 comprising: requesting that all available memory be allocated to the perfect hash vector, the perfect hash vector thereby comprising a number of bits that is substantially greater than a count of the fingerprints. 11. The method of claim 7 comprising: reading containers storing segments at upper levels of the segment trees; and based on the reading of containers storing segments at the upper levels, identifying fingerprints of the segments at the upper levels, and fingerprints of segments at a lowest level of the segment tree that are referenced by segments at a last upper level of the segment trees. 12. The method of claim 7 wherein the perfect hash vector does not store the fingerprints. 13. A computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein, the computer-readable program code adapted to be executed by one or more processors to implement a method comprising: creating a perfect hash vector to track segments in a deduplication file system comprising files, segment trees, and containers, the files being represented by the segment trees, the segment trees having multiple segment levels arranged in a hierarchy, and the containers storing the seg

Assignees

Inventors

Classifications

  • Hash-based (content-based indexing of textual data G06F16/31) · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Trees · CPC title

  • Memory management, e.g. access or allocation · CPC title

  • Hypervisor-specific management and integration aspects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10810162B2 cover?
A perfect hash vector (PHVEC) is created to track segments in a deduplication file system. Files are represented by segment trees having hierarchical segment levels. Containers store the segments and fingerprints of segments. Upper-level segments are traversed to identify a first set of fingerprints of each level. These fingerprints correspond to segments that should be present. The first set o…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).