Optimized record lookups
US-11010300-B2 · May 18, 2021 · US
US11822803B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11822803-B2 |
| Application number | US-202117308481-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 5, 2021 |
| Priority date | Dec 25, 2020 |
| Publication date | Nov 21, 2023 |
| Grant date | Nov 21, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for managing data blocks involve: generating, based on a first hash algorithm, a first fingerprint for a first block. The techniques further involve: if it is determined that there is a second fingerprint, in a fingerprint database, that is generated for a second block based on the first hash algorithm and matches the first fingerprint, determining whether there is a third fingerprint, in the fingerprint database, that is generated for the second block based on a second hash algorithm. The techniques further involve: if it is determined that the third fingerprint exists in the fingerprint database, generating a fourth fingerprint for the first block based on the second hash algorithm; and determining whether the first block and the second block are duplicate by comparing the third fingerprint and the fourth fingerprint. Such techniques can effectively reduce the overhead of identifying duplicate data blocks in data deduplication.
Opening claim text (preview).
The invention claimed is: 1. A method for managing data blocks, comprising: generating, based on a first hash algorithm, a first fingerprint for a first data block to be stored to a storage device; if it is determined that there is a second fingerprint, in a fingerprint database, that is generated for a second data block based on the first hash algorithm and matches the first fingerprint, determining whether there is a third fingerprint, in the fingerprint database, that is generated for the second data block based on a second hash algorithm, wherein the fingerprint database records fingerprints for data blocks stored in the storage device; if it is determined that the third fingerprint exists in the fingerprint database, generating a fourth fingerprint for the first data block based on the second hash algorithm; determining whether the first data block and the second data block are duplicate by comparing the third fingerprint and the fourth fingerprint; and prior to generating the first fingerprint, generating the third fingerprint based on the second hash algorithm and storing the third fingerprint in the fingerprint database; and wherein determining whether there is the third fingerprint includes querying the fingerprint database for the third fingerprint, wherein it is determined that the third fingerprint exists in the fingerprint database. 2. The method according to claim 1 , further comprising: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that no fingerprint matching the new fingerprint exists in the fingerprint database; if it is determined that no fingerprint matching the new fingerprint exists in the fingerprint database, storing the new data block in the storage device; and storing the new fingerprint in the fingerprint database. 3. The method according to claim 1 , further comprising: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that there is an existing fingerprint, in a fingerprint database, that is generated for a stored data block based on the first hash algorithm and matches the new fingerprint, determining that another fingerprint that is generated for the stored data block based on the second hash algorithm does not exist in the fingerprint database; if it is determined that the another fingerprint does not exist in the fingerprint database, acquiring the stored data block from the storage device; and determining whether the new data block and the stored data block are duplicate by comparing the new data block and the stored data block. 4. The method according to claim 3 , wherein determining whether the new data block and the stored data block are duplicate includes determining that the new data block and the stored data block are duplicate; wherein the method further comprises: if it is determined that the new data block and the stored data block are duplicate, generating the another fingerprint for the stored data block based on the second hash algorithm; and storing the another fingerprint in the fingerprint database. 5. The method according to claim 3 , wherein determining whether the new data block and the stored data block are duplicate includes determining that the new data block and the stored data block are not duplicate; wherein the method further comprises: if it is determined that the new data block and the stored data block are not duplicate, storing the new data block in the storage device; and storing the new fingerprint in the fingerprint database. 6. The method according to claim 1 , wherein determining whether the first data block and the second data block are duplicate includes: if the third fingerprint matches the fourth fingerprint, determining that the first data block and the second data block are duplicate; and if the third fingerprint does not match the fourth fingerprint, determining that the first data block and the second data block are not duplicate. 7. The method according to claim 6 , further comprising: if it is determined that the first data block and the second data block are not duplicate, storing the first data block in the storage device; and storing the first fingerprint and the fourth fingerprint in the fingerprint database. 8. The method according to claim 1 , wherein a collision probability of the second hash algorithm is lower than a collision probability of the first hash algorithm. 9. The method according to claim 1 , wherein the first hash algorithm is a Murmur3 hash algorithm, and the second hash algorithm is a SHA-1 hash algorithm. 10. An electronic device, comprising: at least one processing unit; and at least one memory that is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit, wherein the instructions, when executed by the at least one processing unit, cause the electronic device to perform actions including: generating, based on a first hash algorithm, a first fingerprint for a first data block to be stored to a storage device; if it is determined that there is a second fingerprint, in a fingerprint database, that is generated for a second data block based on the first hash algorithm and matches the first fingerprint, determining whether there is a third fingerprint, in the fingerprint database, that is generated for the second data block based on a second hash algorithm, wherein the fingerprint database records fingerprints for data blocks stored in the storage device; if it is determined that the third fingerprint exists in the fingerprint database, generating a fourth fingerprint for the first data block based on the second hash algorithm; determining whether the first data block and the second data block are duplicate by comparing the third fingerprint and the fourth fingerprint; and prior to generating the first fingerprint, generating the third fingerprint based on the second hash algorithm and storing the third fingerprint in the fingerprint database; and wherein determining whether there is the third fingerprint includes querying the fingerprint database for the third fingerprint, wherein it is determined that the third fingerprint exists in the fingerprint database. 11. The electronic device according to claim 10 , wherein the actions further include: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that no fingerprint matching the new fingerprint exists in the fingerprint database; if it is determined that no fingerprint matching the new fingerprint exists in the fingerprint database, storing the new data block in the storage device; and storing the new fingerprint in the fingerprint database. 12. The electronic device according to claim 10 , wherein the actions further include: generating, based on the first hash algorithm, a new fingerprint for a new data block to be stored to the storage device; determining that there is an existing fingerprint, in a fingerprint database, that is generated for a stored data block based on the first hash algorithm and matches the new fingerprint, determining that another fingerprint that is generated for the stored data block based on the second hash algorithm does not exist in the fingerprint database; if it is determined that the another fingerprint does not exist in the fingerprint database, acquiring the stored data block from the storage device; and determining whether the new data block and the stored data block are duplicate by comparing the new data block and
De-duplication techniques · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
in relation to data integrity, e.g. data losses, bit errors · CPC title
Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title
Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.