Method to optimize ingest in dedupe systems by using compressibility hints

US11977525B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11977525-B2
Application numberUS-202117192544-A
CountryUS
Kind codeB2
Filing dateMar 4, 2021
Priority dateMar 4, 2021
Publication dateMay 7, 2024
Grant dateMay 7, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system and computer-readable storage medium for transferring data segments from one computer system to a second computing system. Prior to transfer of the data segments, the first system calculates compressibility ratio of each segment and compares the compressibility ratio to a preset threshold. Based on the comparison, the first system assigns a compressibility hint to each segment. The first system transfers the segments to the second system, together with the corresponding compressibility hint. The second system stores each segment in a compressible region or in a non-compressible region based on the hint. Then the second system compresses the compressible region and stores the compressed region in a container, and stores the non-compressible region uncompressed in the container.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of transferring data segments from a client to a storage system, comprising: for an uncompressed first data segment in the data segments, generating at the client a first compressibility hint indicating whether the uncompressed first data segment should be compressed based on a threshold; determining that the first compressibility hint indicates not to compress the uncompressed first data segment; only in response to determining that the first compressibility hint indicates not compress the uncompressed first data segment: compressing the uncompressed first data segment to produce a compressed first data segment; determining a size difference between the compressed first data segment and the uncompressed first data segment; determining an actual amount of time required to compress the uncompressed first data segment; determining to adjust the threshold based on analyzing the size difference with the actual amount of time required to compress the uncompressed first data segment; and adjusting the threshold based on the determining to adjust the threshold; and sending the uncompressed first data segment to the storage system with the first compressibility hint to assist the storage system in determining whether to compress the uncompressed first data segment prior to storing the first uncompressed data segment. 2. The method of claim 1 , further comprising: generating a first ratio based on a size of the compressed first data segment and a size of the uncompressed first data segment; and generating the first compressibility hint based on comparing the first ratio to the threshold. 3. The method of claim 2 , wherein the first compressibility hint comprises a binary value corresponding to whether the first ratio surpasses the threshold. 4. The method of claim 1 , further comprising: at the storage system, storing the data segments in a compressible region or in a non-compressible region according to their corresponding compressibility hint. 5. The method of claim 4 , further comprising: compressing the compressible region and writing resulting compressed region into a container object, and directly writing the non-compressible region into the container object. 6. The method of claim 1 , wherein generating the first compressibility hint comprises reading a header of a file corresponding to the uncompressed first data segment and when the header indicates a compressed format, generating the first compressibility hint to indicate the uncompressed first data segment should not be compressed. 7. The method of claim 1 , further comprising: generating a compressibility hint file that maps a unique compressibility hint for each of the data segments and transferring the compressibility hint file to the storage system. 8. The method of claim 1 , further comprising: generating a second compressibility hint indicating whether a second uncompressed data segment in the data segments should be compressed based on the adjusted threshold; and sending the uncompressed second data segment with the second compressibility hint to the storage system. 9. A compute system for performing deduplication operations of data segments of a client by a deduplication system, the system comprising a processor and a memory storing executable instructions that, in response to execution by the processor, cause the system to: generate fingerprints for all data segments; filter the fingerprints to identify all uncompressed unique data segments; for each uncompressed unique data segment, generate a compressibility value that indicates whether the uncompressed unique data segment should be compressed based on a threshold; determine that the compressibility value indicates not to compress the uncompressed unique data segment; only in response to determining that the compressibility value indicates to not compress the uncompressed unique data segment: compress the uncompressed unique data segment to produce a compressed unique data segment; determine a size difference between the compressed unique data segment and the uncompressed unique data segment; determine an actual amount of time required to compress the uncompressed unique data segment; determine to adjust the threshold based on analyzing the size difference with the actual amount of time required to compress the uncompressed unique data segment; and adjust the threshold based on the determining to adjust the threshold; send the uncompressed unique data segments together with their corresponding compressibility values to the deduplication system; and store the uncompressed unique data segments having a compressibility value without compression and compress remaining uncompressed unique data segments prior to storage. 10. The system of claim 9 , wherein the compressibility value comprises a binary value, a first compressibility value indicating an unacceptable compression ratio of a corresponding uncompressed unique data segment and a second compressibility value indicating an acceptable compression ratio of a corresponding uncompressed unique data segment. 11. The system of claim 9 , wherein the compressibility value is further determined by comparing a ratio to the threshold, wherein the ratio is based on the compressing the uncompressed unique data segment. 12. The system of claim 10 , wherein the compressibility value is obtained by reading a header of a file corresponding to the uncompressed unique data segment and when the header indicates a compressed format, setting the compressibility value to the first compressibility value indicating the unacceptable compression ratio of a corresponding uncompressed unique segment. 13. The system of claim 9 , wherein the system further to: generate a subsequent compressibility value indicating whether a subsequent uncompressed unique data segment should be compressed based on the adjusted threshold; and send the subsequent uncompressed unique data segment with the subsequent compressibility value to the deduplication system. 14. A computer-readable storage medium for transferring data segments from a first computing system to a second computing system, the computer-readable storage medium being non-transitory and having computer-readable program code stored therein that in response to execution by a processor, causes the first and second computing systems to: determine uncompressed unique data segments to be transferred from the first computing system to the second computing system; assign a compressibility hint to each of the uncompressed unique data segments that indicates whether each of the uncompressed unique data segments should be compressed based on a threshold; determine that one of the compressibility hints indicates to not compress one of the uncompressed unique data segments; only in response to determining that one of the compressibility hints indicates to not compress one of the uncompressed unique data segments: compress the uncompressed unique data segment to produce a compressed unique data segment; determine a size difference between the compressed unique data segment and the uncompressed unique data segment; determine an actual amount of time required to compress the uncompressed unique data segment; determine to adjust the threshold based on analyzing the size difference with the actual amount of time required to compress the uncompressed unique data segment; and adjust the threshold based on the determining to adjust the threshold; transfer the uncompressed unique data segments together with the compressibility hints from the first computing system to the second computing sy

Assignees

Inventors

Classifications

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • using compression, e.g. sparse files · CPC title

  • Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • H03M7/3091Primary

    Data deduplication · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11977525B2 cover?
A method, system and computer-readable storage medium for transferring data segments from one computer system to a second computing system. Prior to transfer of the data segments, the first system calculates compressibility ratio of each segment and compares the compressibility ratio to a preset threshold. Based on the comparison, the first system assigns a compressibility hint to each segment.…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 07 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).