Method, device, and computer readable medium for data deduplication

US11829624B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11829624-B2
Application numberUS-202016843004-A
CountryUS
Kind codeB2
Filing dateApr 8, 2020
Priority dateApr 29, 2019
Publication dateNov 28, 2023
Grant dateNov 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques provide for data deduplication. Such techniques involve: allocating a storage area in a storage device, the storage area including a first storage segment for storing an incompressible data block and a second storage segment for storing a compressed data block, a first size of the first storage segment being greater than a second size of the second storage segment; in response to receiving a write request, determining whether data block to which the write request is related is compressible; in response to determining that the data block is incompressible, adding header information to the data block to generate a first data segment of the first size; and storing the first data segment in the first storage segment through a deduplication operation. Accordingly, such techniques can increase the flexibility and efficiency of data deduplication.

First claim

Opening claim text (preview).

We claim: 1. A method for data deduplication, comprising: receiving a first write request; in response to receiving the first write request, determining that a first data block to which the first write request is related is incompressible, the first data block having a predetermined block size; in response to determining that the first data block is incompressible, adding first header information to the incompressible first data block to generate a first data segment having a first size greater than the predetermined block size, wherein a first storage segment has been allocated in a storage area of a storage device, the first storage segment having the first size greater than the predetermined block size; and performing a first deduplication operation for the incompressible first data block, the performing the first deduplication operation comprising: creating a first logical block corresponding to the first data segment; identifying, by the first header information, a uniqueness of the incompressible first data block, wherein the identifying the uniqueness of the incompressible first data block includes generating a first feature value from the first header information, the first feature value being a first hash value, the first header information being metadata identifying a length of the incompressible first data block, a storage location of the incompressible first data block, and the uniqueness of the incompressible first data block; determining whether or not a set of feature values includes the first feature value, the set of feature values including different feature values for different data blocks; in response to determining that the set of feature values includes the first feature value, mapping, by a mapper, the first logical block to the first storage segment in which the incompressible first data block has previously been stored; and in response to determining that the set of feature values does not include the first feature value, adding the first feature value into the set of feature values, storing the incompressible first data block in the first storage segment, and mapping, by the mapper, the first logical block to the first storage segment. 2. The method of claim 1 further comprising: receiving a second write request; in response to receiving the second write request, determining that a second data block to which the second write request is related is compressible, the second data block having the predetermined block size; in response to determining that the second data block is compressible: compressing the second data block; and adding second header information to the compressed second data block to generate a second data segment having a second size smaller than the first size of the first data segment, wherein a second storage segment has been allocated in the storage area of the storage device, the second storage segment having the second size; and performing a second deduplication operation for the compressed second data block. 3. The method of claim 2 , wherein the performing the second deduplication operation comprises: creating a second logical block corresponding to the second data segment; identifying, by the second header information, a uniqueness of the compressed second data block, wherein the identifying the uniqueness of the compressed second data block includes generating a second feature value from the second header information, the second feature value being a second hash value, the second header information being metadata identifying a length of the compressed second data block, a storage location of the compressed second data block, and the uniqueness of the compressed second data block; determining whether or not the set of feature values includes the second feature value; in response to determining that the set of feature values includes the second feature value, mapping, by the mapper, the second logical block to the second storage segment in which the compressed second data block has previously been stored; and in response to determining that the set of feature values does not include the second feature value, adding the second feature value into the set of feature values, storing the compressed second data block in the second storage segment, and mapping, by the mapper, the second logical block to the second storage segment. 4. The method of claim 2 , further comprising: in further response to determining that the second data block is compressible, adding padding information to the compressed second data block to generate the second data segment of the second size. 5. The method of claim 1 , wherein determining whether the first data block is compressible or incompressible comprises: compressing the first data block; determining a compression ratio of the compressing of the first data block; determining that the compression ratio is greater than a threshold; and in response to determining that the compression ratio is greater than the threshold, determining that the first data block is incompressible. 6. An electronic device, comprising: one or more processors; and a storage device for storing one or more programs, the one or more programs, when executed by the one or more processors, causing the one or more processors to perform acts comprising: receiving a write request; in response to receiving the write request, determining that a data block to which the write request is related is incompressible, the data block having a predetermined block size; in response to determining that the data block is incompressible, adding header information to the incompressible data block to generate a first data segment having a first size greater than the predetermined block size, wherein a first storage segment has been allocated in a storage area of a storage device, the first storage segment having the first size greater than the predetermined block size; and performing a first deduplication operation for the incompressible first data block, the performing the first deduplication operation comprising: creating a first logical block corresponding to the first data segment; identifying, by the first header information, a uniqueness of the incompressible first data block, wherein the identifying the uniqueness of the incompressible first data block includes generating a first feature value from the first header information, the first feature value being a first hash value, the first header information being metadata identifying a length of the incompressible first data block, a storage location of the incompressible first data block, and the uniqueness of the incompressible first data block; determining whether or not a set of feature values includes the first feature value, the set of feature values including different feature values for different data blocks; in response to determining that the set of feature values includes the first feature value, mapping, by a mapper, the first logical block to the first storage segment in which the incompressible first data block has previously been a stored; and in response to determining that the set of feature values does not include the first feature value, adding the first feature value into the set of feature values, storing the incom pressible first data block in the first storage segment, and mapping, by the mapper, the first logical block to the first storage segment. 7. The device of claim 6 , wherein the acts further comprise: receiving a second write request; in response to receiving the second write request, determining that a second data block to which the second write request is related is compressible, the second data block having the predetermined block size; in response to determining that the second data block is compressible: compres

Assignees

Inventors

Classifications

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • by allocating resources to storage systems · CPC title

  • Data buffering arrangements · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11829624B2 cover?
Techniques provide for data deduplication. Such techniques involve: allocating a storage area in a storage device, the storage area including a first storage segment for storing an incompressible data block and a second storage segment for storing a compressed data block, a first size of the first storage segment being greater than a second size of the second storage segment; in response to rec…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).