Method, device, and computer program product for de-duplicating data

US12293102B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12293102-B2
Application numberUS-202318215414-A
CountryUS
Kind codeB2
Filing dateJun 28, 2023
Priority dateDec 22, 2022
Publication dateMay 6, 2025
Grant dateMay 6, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for de-duplicating data involve: determining a target physical block in a first storage device. The techniques further involve: determining a compression ratio of a target data block in a plurality of data blocks to be transferred. The techniques further involve: determining a target hash value of the target data block in response to the compression ratio being lower than a threshold compression ratio. The techniques further involve: determining a de-duplication operation for the target data block based on the target hash value and a de-duplication hash table, the de-duplication hash table storing hash values of data blocks that have been transferred from the first storage device to the second storage device. Accordingly, the amount of data that needs to be transferred can be reduced, and the storage space of the storage devices can be improved, thus increasing the resource utilization and improving the user experience.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for de-duplicating data, comprising: determining a target physical block in a first storage device, a plurality of data blocks in the target physical block being to be transferred to a second storage device; determining a compression ratio of a target data block in the plurality of data blocks; determining a target hash value of the target data block in response to the compression ratio being lower than a threshold compression ratio; and determining a de-duplication operation for the target data block based on the target hash value and a de-duplication hash table, the de-duplication hash table storing hash values of data blocks that have been transferred from the first storage device to the second storage device; wherein determining the de-duplication operation comprises: determining from a plurality of logically contiguous data blocks a group of logically contiguous data blocks starting from the target data block, hash values of the group of logically contiguous data blocks hitting data blocks in the de-duplication hash table that are located in contiguous physical space; determining whether the number of data blocks in the group of logically contiguous data blocks exceeds a threshold number; and de-duplicating the group of logically contiguous data blocks in response to the number exceeding the threshold number. 2. The method according to claim 1 , wherein determining the target physical block comprises: determining a heat of a candidate physical block in the first storage device, the heat indicating how frequently the candidate physical block is accessed; and determining the candidate physical block as the target physical block in response to the heat being less than a threshold heat. 3. The method according to claim 1 , wherein determining the target physical block comprises: determining a storage density of a candidate physical block in the first storage device, the storage density indicating the extent to which the candidate physical block is utilized; and determining the candidate physical block as the target physical block in response to the storage density being lower than a threshold storage density. 4. The method according to claim 1 , wherein determining the compression ratio comprises: acquiring metadata of the target data block; and determining the compression ratio based on the metadata. 5. The method according to claim 1 , wherein determining the de-duplication operation comprises: determining whether the target hash value exists in the de-duplication hash table; and de-duplicating the target data block in response to the target hash value existing in the de-duplication hash table. 6. The method according to claim 5 , wherein determining the de-duplication operation further comprises: transferring the target data block to the second storage device in response to the hash value not existing in the de-duplication hash table; and adding the target hash value to the de-duplication hash table. 7. The method according to claim 1 , wherein determining the de-duplication operation further comprises: determining the plurality of logically contiguous data blocks based on the target data block, the plurality of logically contiguous data blocks taking the target data block as a starting data block; and determining hash values of other data blocks following the target data block in the plurality of logically contiguous data blocks. 8. The method according to claim 7 , wherein determining the de-duplication operation further comprises: determining, in response to the number not exceeding the threshold number, whether the target hash value exists in the de-duplication hash table; and de-duplicating the target data block in response to the target hash value existing in the de-duplication hash table. 9. The method according to claim 1 , wherein the threshold compression ratio is a first threshold compression ratio, and the method further comprises: transferring the target data block to the second storage device in response to the compression ratio being greater than or equal to the first threshold compression ratio. 10. The method according to claim 9 , wherein transferring the target data block to the second storage device comprises: generating a group of data blocks comprising the target data block; compressing the group of data blocks as a whole to determine a group compression ratio; determining whether the group compression ratio is greater than a second threshold compression ratio; and transferring the compressed group of data blocks to the second storage device in response to the group compression ratio being greater than the second threshold compression ratio. 11. The method according to claim 10 , further comprising: decompressing the compressed group of data blocks in response to the group compression ratio being less than or equal to the second threshold compression ratio; compressing the target data block in the decompressed group of data blocks alone; and transferring the compressed target data block to the second storage device. 12. The method according to claim 1 , wherein the first storage device has a shorter device access time than the second storage device. 13. An electronic device, comprising: at least one processor; and a memory coupled to the at least one processor and having instructions stored therein, wherein the instructions, when executed by the at least one processor, cause the device to perform actions comprising: determining a target physical block in a first storage device, a plurality of data blocks in the target physical block being to be transferred to a second storage device; determining a compression ratio of a target data block in the plurality of data blocks; determining a target hash value of the target data block in response to the compression ratio being lower than a threshold compression ratio; and determining a de-duplication operation for the target data block based on the target hash value and a de-duplication hash table, the de-duplication hash table storing hash values of data blocks that have been transferred from the first storage device to the second storage device; wherein determining the de-duplication operation comprises: determining from a plurality of logically contiguous data blocks a group of logically contiguous data blocks starting from the target data block, hash values of the group of logically contiguous data blocks hitting data blocks in the de-duplication hash table that are located in contiguous physical space; determining whether the number of data blocks in the group of logically contiguous data blocks exceeds a threshold number; and de-duplicating the group of logically contiguous data blocks in response to the number exceeding the threshold number. 14. The electronic device according to claim 13 , wherein determining the target physical block comprises: determining a heat of a candidate physical block in the first storage device, the heat indicating how frequently the candidate physical block is accessed; and determining the candidate physical block as the target physical block in response to the heat being less than a threshold heat. 15. The electronic device according to claim 13 , wherein determining the target physical block comprises: determining a storage density of a candidate physical block in the first storage device, the storage density indicating the extent to which the candidate physical block is utilized; and determining the candidate physical block as the target physical block in response to the storage density being lower than a

Assignees

Inventors

Classifications

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • Plurality of storage devices · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Hybrid storage device · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12293102B2 cover?
Techniques for de-duplicating data involve: determining a target physical block in a first storage device. The techniques further involve: determining a compression ratio of a target data block in a plurality of data blocks to be transferred. The techniques further involve: determining a target hash value of the target data block in response to the compression ratio being lower than a threshold…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 06 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).