Method, electronic device and computer program product for managing data blocks

US11822803B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11822803-B2
Application numberUS-202117308481-A
CountryUS
Kind codeB2
Filing dateMay 5, 2021
Priority dateDec 25, 2020
Publication dateNov 21, 2023
Grant dateNov 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for managing data blocks involve: generating, based on a first hash algorithm, a first fingerprint for a first block. The techniques further involve: if it is determined that there is a second fingerprint, in a fingerprint database, that is generated for a second block based on the first hash algorithm and matches the first fingerprint, determining whether there is a third fingerprint, in the fingerprint database, that is generated for the second block based on a second hash algorithm. The techniques further involve: if it is determined that the third fingerprint exists in the fingerprint database, generating a fourth fingerprint for the first block based on the second hash algorithm; and determining whether the first block and the second block are duplicate by comparing the third fingerprint and the fourth fingerprint. Such techniques can effectively reduce the overhead of identifying duplicate data blocks in data deduplication.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for managing data blocks, comprising: generating, based on a first hash algorithm, a first fingerprint for a first data block to be stored to a storage device; if it is determined that there is a second fingerprint, in a fingerprint database, that is generated for a second data block based on the first hash algorithm and matches the first fingerprint, determining whether there is a third fingerprint, in the fingerprint database, that is generated for the second data block based on a second hash algorithm, wherein the fingerprint database records fingerprints for data blocks stored in the storage device; if it is determined that the third fingerprint exists in the fingerprint database, generating a fourth fingerprint for the first data block based on the second hash algorithm; determining whether the first data block and the second data block are duplicate by comparing the third fingerprint and the fourth fingerprint; and prior to generating the first fingerprint, generating the third fingerprint based on the second hash algorithm and storing the third fingerprint in the fingerprint database; and wherein determining whether there is the third fingerprint includes querying the fingerprint database for the third fingerprint, wherein it is determined that the third fingerprint exists in the fingerprint database. 2. The method according to claim 1 , further comprising: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that no fingerprint matching the new fingerprint exists in the fingerprint database; if it is determined that no fingerprint matching the new fingerprint exists in the fingerprint database, storing the new data block in the storage device; and storing the new fingerprint in the fingerprint database. 3. The method according to claim 1 , further comprising: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that there is an existing fingerprint, in a fingerprint database, that is generated for a stored data block based on the first hash algorithm and matches the new fingerprint, determining that another fingerprint that is generated for the stored data block based on the second hash algorithm does not exist in the fingerprint database; if it is determined that the another fingerprint does not exist in the fingerprint database, acquiring the stored data block from the storage device; and determining whether the new data block and the stored data block are duplicate by comparing the new data block and the stored data block. 4. The method according to claim 3 , wherein determining whether the new data block and the stored data block are duplicate includes determining that the new data block and the stored data block are duplicate; wherein the method further comprises: if it is determined that the new data block and the stored data block are duplicate, generating the another fingerprint for the stored data block based on the second hash algorithm; and storing the another fingerprint in the fingerprint database. 5. The method according to claim 3 , wherein determining whether the new data block and the stored data block are duplicate includes determining that the new data block and the stored data block are not duplicate; wherein the method further comprises: if it is determined that the new data block and the stored data block are not duplicate, storing the new data block in the storage device; and storing the new fingerprint in the fingerprint database. 6. The method according to claim 1 , wherein determining whether the first data block and the second data block are duplicate includes: if the third fingerprint matches the fourth fingerprint, determining that the first data block and the second data block are duplicate; and if the third fingerprint does not match the fourth fingerprint, determining that the first data block and the second data block are not duplicate. 7. The method according to claim 6 , further comprising: if it is determined that the first data block and the second data block are not duplicate, storing the first data block in the storage device; and storing the first fingerprint and the fourth fingerprint in the fingerprint database. 8. The method according to claim 1 , wherein a collision probability of the second hash algorithm is lower than a collision probability of the first hash algorithm. 9. The method according to claim 1 , wherein the first hash algorithm is a Murmur3 hash algorithm, and the second hash algorithm is a SHA-1 hash algorithm. 10. An electronic device, comprising: at least one processing unit; and at least one memory that is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit, wherein the instructions, when executed by the at least one processing unit, cause the electronic device to perform actions including: generating, based on a first hash algorithm, a first fingerprint for a first data block to be stored to a storage device; if it is determined that there is a second fingerprint, in a fingerprint database, that is generated for a second data block based on the first hash algorithm and matches the first fingerprint, determining whether there is a third fingerprint, in the fingerprint database, that is generated for the second data block based on a second hash algorithm, wherein the fingerprint database records fingerprints for data blocks stored in the storage device; if it is determined that the third fingerprint exists in the fingerprint database, generating a fourth fingerprint for the first data block based on the second hash algorithm; determining whether the first data block and the second data block are duplicate by comparing the third fingerprint and the fourth fingerprint; and prior to generating the first fingerprint, generating the third fingerprint based on the second hash algorithm and storing the third fingerprint in the fingerprint database; and wherein determining whether there is the third fingerprint includes querying the fingerprint database for the third fingerprint, wherein it is determined that the third fingerprint exists in the fingerprint database. 11. The electronic device according to claim 10 , wherein the actions further include: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that no fingerprint matching the new fingerprint exists in the fingerprint database; if it is determined that no fingerprint matching the new fingerprint exists in the fingerprint database, storing the new data block in the storage device; and storing the new fingerprint in the fingerprint database. 12. The electronic device according to claim 10 , wherein the actions further include: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that there is an existing fingerprint, in a fingerprint database, that is generated for a stored data block based on the first hash algorithm and matches the new fingerprint, determining that another fingerprint that is generated for the stored data block based on the second hash algorithm does not exist in the fingerprint database; if it is determined that the another fingerprint does not exist in the fingerprint database, acquiring the stored data block from the storage device; and determining whether the new data block and the stored data block are duplicate by comparing the new data block and

Assignees

Inventors

Classifications

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • in relation to data integrity, e.g. data losses, bit errors · CPC title

  • Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title

  • Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11822803B2 cover?
Techniques for managing data blocks involve: generating, based on a first hash algorithm, a first fingerprint for a first block. The techniques further involve: if it is determined that there is a second fingerprint, in a fingerprint database, that is generated for a second block based on the first hash algorithm and matches the first fingerprint, determining whether there is a third fingerprin…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).