System and method for improving data compression of a storage system in an online manner

US9767154B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9767154-B1
Application numberUS-201615336769-A
CountryUS
Kind codeB1
Filing dateOct 27, 2016
Priority dateSep 26, 2013
Publication dateSep 19, 2017
Grant dateSep 19, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for improving data compression of a storage system in an online manner are described herein. According to one embodiment, in response to a sequence of data to be stored, the sequence of data is partitioned into a plurality of data chunks according to a predetermined chunking algorithm. A sketch for each of the data chunks is generated based on one or more features extracted from the data chunk. Each of the data chunks of the sequence of data is associated with one of a plurality of groups based on the sketch, wherein each group is represented by a sketch. The data chunks of each group are compressed and stored in a compression region of the storage systems, such that similar data chunks are compressed and stored in the same compression region.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for improving data compression of data chunks of a storage system, the method comprising: in response to a sequence of data to be stored in a memory, partitioning, by using a computer system, the sequence of data into a plurality of data chunks according to a predetermined chunking algorithm; generating, by using a computer system, a sketch for each data chunk of the data chunks based on one or more features extracted from the data chunk; identifying, by using the computer system, an existing compression region for each data chunk based on a sketch of the data chunk; grouping, by using the computer system, said each data chunk of the data chunks of the sequence of data with one of a plurality of groups based on the sketch of the data chunk, wherein each group of the plurality of groups is represented by a sketch, wherein the grouping said each data chunk of the data chunks of the sequence of data comprises merging the data chunk of the sequence of data with data chunks of the identified existing compression region and reorganizing the data chunks of the sequence of data and data chunks of the existing compression region from an original sequence order to a second sequence order, wherein similar data chunks are positioned adjacent to each other; determining, by using the computer system, whether a number of data chunks of said each group of the plurality of the groups reaches a predetermined threshold; and in response to the determining that the number of data chunks of said each group of the plurality of the groups reaches a predetermined threshold, compressing, by using the computer system, and storing in the memory of the computer system data chunks of said each group of the plurality of groups in a corresponding existing compression region of the storage system so that the similar data chunks are compressed and stored in the same compression region, wherein said each group of the plurality of the groups is related with a buffer that stores a plurality of data chunks from the sequence of data that have the same sketch. 2. The method of claim 1 , wherein each group is associated with a buffer that buffers a plurality of existing data chunks having the same sketch that have been previously stored in the storage system. 3. The method of claim 2 , wherein associating each data chunk of the sequence of data with one of a plurality of groups based on the sketch comprises: identifying an existing compression region based on the sketch of data chunk of the sequence of data. 4. The method of claim 3 , further comprising: retrieving and decompressing data chunks from the identified existing compression region; and storing the data chunks in one of the groups associated with the sketch of the data chunk. 5. The method of claim 4 , further comprising writing the group of the merged data chunks to a new compression region, wherein a previous compression region space is reclaimed after the group of the merged data chunks have been written to the new compression region. 6. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor perform operations of improving data compression of data chunks of a storage system, the operations comprising: in response to a sequence of data to be stored in a memory, partitioning, by using a computer system, the sequence of data into a plurality of data chunks according to a predetermined chunking algorithm; generating, by using a computer system, a sketch for each data chunk of the data chunks based on one or more features extracted from the data chunk; identifying, by using the computer system, an existing compression region for each data chunk based on a sketch of the data chunk; grouping, by using the computer system, said each data chunk of the data chunks of the sequence of data with one of a plurality of groups based on the sketch of the data chunk, wherein each group of the plurality of groups is represented by a sketch, wherein the grouping said each data chunk of the data chunks of the sequence of data comprises merging the data chunk of the sequence of data with data chunks of the identified existing compression region and reorganizing the data chunks of the sequence of data and data chunks of the existing compression region from an original sequence order to a second sequence order, wherein similar data chunks are positioned adjacent to each other; determining, by using the computer system, whether a number of data chunks of said each group of the plurality of the groups reaches a predetermined threshold; and in response to the determining that the number of data chunks of said each group of the plurality of the groups reaches the predetermined threshold, compressing, by using the computer system, and storing in the memory of the computer system data chunks of said each group of the plurality of groups in a corresponding existing compression region of the storage system so that the similar data chunks are compressed and stored in the same compression region, wherein said each group of the plurality of the groups is related with a buffer that stores a plurality of data chunks from the sequence of data that have the same sketch. 7. The machine-readable medium of claim 6 , wherein each group is associated with a buffer that buffers a plurality of existing data chunks having the same sketch that have been previously stored in the storage system. 8. The machine-readable medium of claim 7 , wherein associating each data chunk of the sequence of data with one of a plurality of groups based on the sketch comprises: identifying an existing compression region based on the sketch of data chunk of the sequence of data. 9. The machine-readable medium of claim 8 , wherein the operations further comprise: retrieving and decompressing data chunks from the identified existing compression region; and storing the data chunks in one of the groups associated with the sketch of the data chunk. 10. The machine-readable medium of claim 9 , wherein the operations further comprise writing the group of the merged data chunks to a new compression region, wherein a previous compression region space is reclaimed after the group of the merged data chunks have been written to the new compression region. 11. A computer system for processing data, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations, the operations including: in response to a sequence of data to be stored in the memory, partitioning the sequence of data into a plurality of data chunks according to a predetermined chunking algorithm, generating a sketch for each data chunk of the data chunks based on one or more features extracted from the data chunk, identifying an existing compression region for each data chunk based on a sketch of the data chunk, grouping each data chunk of the data chunks of the sequence of data with one of a plurality of groups based on the sketch of the data chunk, wherein each group of the plurality of groups is represented by a sketch, wherein the grouping said each data chunk of the data chunks of the sequence of data comprises merging the data chunk of the sequence of data with data chunks of the identified existing compression region and reorganizing the data chunks of the sequence of data and data chunks of the existing compression region from an original sequence order to a second sequence order, wherein similar data chunks are positioned adjacent to each other, determining whether a number of data chunks of said each group of the groups reaches a predetermined t

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • H03M7/3091Primary

    Data deduplication · CPC title

  • Intermediate data storage techniques for performance improvement · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9767154B1 cover?
Techniques for improving data compression of a storage system in an online manner are described herein. According to one embodiment, in response to a sequence of data to be stored, the sequence of data is partitioned into a plurality of data chunks according to a predetermined chunking algorithm. A sketch for each of the data chunks is generated based on one or more features extracted from the …
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30501. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 19 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).