Indirect block containing references to blocks of a persistent fingerprint index

US11468030B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11468030-B2
Application numberUS-201916669993-A
CountryUS
Kind codeB2
Filing dateOct 31, 2019
Priority dateOct 31, 2019
Publication dateOct 11, 2022
Grant dateOct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some examples, a system performs data deduplication using a deduplication fingerprint index in a hash data structure comprising a plurality of blocks, wherein the hash data structure is stored in persistent storage, and a block of the plurality of blocks comprises fingerprints computed based on content of respective data units. The system uses an indirect block in a memory to access a given block of the plurality of blocks in the hash data structure, the indirect block containing references to blocks of the hash data structure containing the deduplication fingerprint index, and the references indicating storage locations of the plurality of blocks in the persistent storage.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory machine-readable storage medium comprising instructions that upon execution cause a system to: perform data deduplication using a deduplication fingerprint index in a hash data structure comprising a plurality of blocks, wherein the hash data structure is stored in persistent storage, and a block of the plurality of blocks comprises fingerprints computed based on content of respective data units, wherein a first block of the plurality of blocks in the hash data structure comprises a plurality of buckets, and wherein each bucket of the plurality of buckets associates a respective fingerprint to a storage location indicator of a corresponding data unit, the respective fingerprint computed based on the corresponding data unit; and use an indirect block in a memory to access a given block of the plurality of blocks in the hash data structure, the indirect block containing references to blocks of the hash data structure containing the deduplication fingerprint index, and the references identifying storage locations of the plurality of blocks in the persistent storage. 2. The non-transitory machine-readable storage medium of claim 1 , wherein the indirect block does not include any of the fingerprints computed based on the content of the respective data units. 3. The non-transitory machine-readable storage medium of claim 1 , wherein the indirect block does not include size information regarding sizes of the blocks of the hash data structure. 4. The non-transitory machine-readable storage medium of claim 1 , wherein the hash data structure is a log structured hash table, and wherein the instructions upon execution cause the system to: for updating the log structured hash table, append new blocks containing fingerprints to the log structured hash table. 5. The non-transitory machine-readable storage medium of claim 4 , wherein the log structured hash table comprises segments, each segment of the segments comprising multiple blocks, and wherein the appending comprises appending a new segment comprising the new blocks to a boundary of the log structured hash table. 6. The non-transitory machine-readable storage medium of claim 1 , wherein a bucket of the plurality of buckets associates a plurality of fingerprints to respective storage location indicators of corresponding data units. 7. The non-transitory machine-readable storage medium of claim 1 , wherein the hash data structure comprises header information indicating a number of fingerprints included in each bucket of the plurality of buckets. 8. The non-transitory machine-readable storage medium of claim 1 , wherein the given block of the plurality of blocks in the hash data structure is accessible using the indirect block without performing a binary search of the deduplication fingerprint index. 9. The non-transitory machine-readable storage medium of claim 1 , wherein the instructions upon execution cause the system to: for a data unit, compute a corresponding fingerprint based on the data unit; compute a respective block index based on the corresponding fingerprint, the respective block index referring to a corresponding block of the deduplication fingerprint index in the hash data structure; use the respective block index to look up a given reference, from among the references, in the indirect block; and determine, from the given reference, the storage location of the given corresponding block. 10. The non-transitory machine-readable storage medium of claim 1 , wherein the plurality of blocks in the hash data structure are variable sized blocks. 11. The non-transitory machine-readable storage medium of claim 1 , wherein the references include addresses of the plurality of blocks in the persistent storage. 12. The non-transitory machine-readable storage medium of claim 1 , wherein the hash data structure in the persistent storage contains a persistent version of the indirect block. 13. A system comprising: persistent storage to store a hash data structure comprising a deduplication fingerprint index, the hash data structure comprising a plurality of blocks, and a block of the plurality of blocks comprises fingerprints computed based on content of respective data units, the fingerprints included in the deduplication fingerprint index, wherein a first block of the plurality of blocks in the hash data structure comprises a plurality of buckets, and wherein each bucket of the plurality of buckets associates a respective fingerprint to a storage location indicator of a corresponding data unit, the respective fingerprint computed based on the corresponding data unit; a memory to store an indirect block containing block references indicating storage locations of blocks of the hash data structure containing the deduplication fingerprint index; and a processor to execute instructions stored on a machine-readable storage medium to: compute block indexes based on respective fingerprints for the deduplication fingerprint index, wherein different block indexes map to different block references in the indirect block, and use the computed block indexes to look up respective block references in the indirect block. 14. The system of claim 13 , wherein the processor is to execute instructions stored on the machine-readable storage medium to access a given block of the plurality of blocks in the hash data structure using a block reference retrieved from the indirect block, and wherein the block references of the indirect block identify storage locations of the plurality of blocks in the persistent storage. 15. The system of claim 13 , wherein the processor is to execute instructions stored on the machine-readable storage medium to: receive an incoming data unit; compute a fingerprint based on the incoming data unit; based on the computed fingerprint, access an entry of the indirect block and retrieve a respective block reference from the accessed entry; and as part of a deduplication process for the incoming data unit, determine if the computed fingerprint is included in a block, of the deduplication fingerprint index, referenced by the respective block reference. 16. The system of claim 15 , wherein the block referenced by the respective block reference includes fingerprints in sorted order. 17. The system of claim 13 , wherein the hash data structure is a log structured hash table, and wherein the processor is to execute instructions stored on the machine-readable storage medium to: for updating the log structured hash table, append new blocks containing fingerprints to a boundary of the log structured hash table. 18. A method performed by a system comprising a hardware processor, comprising: performing data deduplication using a deduplication fingerprint index in a hash data structure comprising a plurality of blocks, wherein the hash data structure is stored in persistent storage, and a block of the plurality of blocks comprises fingerprints computed based on content of respective data units, the fingerprints arranged in sorted order in the block, wherein a first block of the plurality of blocks in the hash data structure comprises a plurality of buckets, and wherein each bucket of the plurality of buckets associates a respective fingerprint to a storage location indicator of a corresponding data unit, the respective fingerprint computed based on the corresponding data unit; storing an indirect block in a memory, the indirect block containing an array of references to blocks of the hash data structure containing the deduplication fingerprint index, the references ident

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Hash tables · CPC title

  • Pointer or reference processing operations · CPC title

  • Trees, e.g. B+trees · CPC title

  • Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468030B2 cover?
In some examples, a system performs data deduplication using a deduplication fingerprint index in a hash data structure comprising a plurality of blocks, wherein the hash data structure is stored in persistent storage, and a block of the plurality of blocks comprises fingerprints computed based on content of respective data units. The system uses an indirect block in a memory to access a given …
Who is the assignee on this patent?
Hewlett Packard Entpr Dev Lp
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).