System and method for improving data compression in a deduplicated storage system
US-9411815-B1 · Aug 9, 2016 · US
US9843702B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9843702-B1 |
| Application number | US-201213436677-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 30, 2012 |
| Priority date | Mar 30, 2012 |
| Publication date | Dec 12, 2017 |
| Grant date | Dec 12, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for compressing a data set, the method comprising receiving a first data block of the data set, selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on projected compression efficacy or resource utilization, and compressing the first data block with the selected compression module to generate a first compressed data block.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for compressing all data blocks of a data set of a client, the method comprising: receiving a plurality of data blocks of the data set from the client, across a network, by a backup storage management server comprising a compression management module, wherein all data blocks in the plurality of data blocks are to be compressed and stored by the backup storage management server; selecting, automatically by the compression management module, a compression module from a plurality of compression modules, wherein the plurality of compression modules comprises a fast compression module as a default compression module; selecting an uncompressed first data block from the plurality of data blocks of the data set for compression; compressing the first data block using the selected compression module to generate a compressed first data block; analyzing the compression efficacy for the first data block; wherein analyzing the compression efficacy for the first data block comprises: in response to determining that the selected compression module results in a compression ratio for the first data block being between ten percent and ninety percent: determining resource utilization for compressing the first data block, upon determining that the resource utilization was within resource usage constraints, automatically selecting a compression module from the plurality of compression modules that has a higher compression ratio for use in compressing a second data block in the plurality of data blocks, and upon determining that the resource utilization was not within resource usage constraints, automatically selecting the fast compression module to compress the second data block; in response to determining that the selected compression module results in less than ten percent, or greater than ninety percent, compression ratio, selecting the fast compression module for compressing the second data block; adding compression module information for the first data block to the compression meta-data for the data set; compressing the second data block using the selected compression module for the second data block; analyzing the compression efficacy for the second data block; and adding compression module information for the second data block to the compression meta-data for the data set. 2. The computer-implemented method of claim 1 , wherein analyzing the compression efficacy for the first data block further comprises analyzing a sliding scale trade-off between available computational resources and the computational requirements of a compression module. 3. The computer-implemented method of claim 1 , wherein analyzing the compression efficacy for the first data block comprises analyzing a previously compressed data block. 4. The computer-implemented method of claim 3 , wherein analyzing the previously compressed data block comprises: determining an identifier for the previously compressed data block; and looking up the compression module of the previously compressed data block, by a compression tracking module, using the identifier. 5. The computer-implemented method of claim 1 , further comprising: storing statistical data of the compressed first data block with statistical data for the data set, wherein statistical data comprises at least one of content type of the compressed first data block, the compression module for the first data block, and compression ratio for the compressed first data block; and using the compression meta-data for the data set to select a compression module for another data set having similar data or for a particular client. 6. The computer-implemented method of claim 1 , further comprising: scheduling further processing of the compressed first data block. 7. The computer-implemented method of claim 1 , wherein the first data block and the second data block are data blocks of a single file. 8. The computer-implemented method of claim 1 , further comprising scheduling a future off line compression of a data set that was previously compressed and stored in the backup storage management server, the off line compression for a data block using a compression module that provides a higher level of compression than the compression module that was selected to compress the data block. 9. The computer-implemented method of claim 8 , wherein the off line compression of the data block is scheduled in response to determining that one or more data blocks were compressed using a more resource efficient compression module to maintain a compression throughput ratio. 10. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform a set of operations for compressing all data blocks of a data set of a client, the set of operations comprising: receiving a plurality of data blocks of the data set from the client, across a network, by a backup storage management server comprising a compression management module, wherein all data blocks in the plurality of data blocks are to be compressed and stored by the backup storage management server; selecting, automatically by the compression management module, a compression module from a plurality of compression modules, wherein the plurality of compression modules comprises a fast compression module as a default compression module; selecting an uncompressed first data block from the plurality of data blocks of the data set for compression; compressing the first data block using the selected compression module to generate a compressed first data block; analyzing the compression efficacy for the first data block; wherein analyzing the compression efficacy for the first data block comprises: in response to determining that the selected compression module results in a compression ratio for the first data block being between ten percent and ninety percent: determining resource utilization for compressing the first data block, upon determining that the resource utilization was within resource usage constraints, automatically selecting a compression module from the plurality of compression modules that has a higher compression ratio for use in compressing a second data block in the plurality of data blocks, and upon determining that the resource utilization was not within resource usage constraints, automatically selecting the fast compression module to compress the second data block; in response to determining that the selected compression module results in less than ten percent, or greater than ninety percent, compression ratio, selecting the fast compression module for compressing the second data block; adding compression module information for the first data block to the compression meta-data for the data set; compressing the second data block using the selected compression module for the second data block; analyzing the compression efficacy for the second data block; and adding compression module information for the second data block to the compression meta-data for the data set. 11. The non-transitory computer-readable storage medium of claim 10 , wherein analyzing the compression efficacy for the first data block further comprises analyzing a sliding scale trade-off between available computational resources and the computational requirements of a compression module. 12. The non-transitory computer-readable storage medium of claim 10 , wherein analyzing the compression efficacy for the first data block comprises analyzing a previously compressed data block. 13. The non-transitory computer-readable storage medium of claim 12 , wherein analyzing the previou
using adaptive coding · CPC title
specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata · CPC title
Bandwidth or redundancy reduction (by scanning H04N1/17 {; methods or arrangements for coding, decoding, compressing or decompressing digital video signals H04N19/00}) · CPC title
Conversion of a code where information is represented by a given sequence or number of digits to a code where the same {, similar or subset of} information is represented by a different sequence or number of digits · CPC title
Data deduplication · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.