System and method for improving data compression in a deduplicated storage system
US-9411815-B1 · Aug 9, 2016 · US
US9843802B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9843802-B1 |
| Application number | US-201213436641-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 30, 2012 |
| Priority date | Mar 30, 2012 |
| Publication date | Dec 12, 2017 |
| Grant date | Dec 12, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for compressing a data set, the method comprising receiving a first data block of the data set, selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on projected compression efficacy or resource utilization, and compressing the first data block with the selected compression module to generate a first compressed data block.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for compressing a data set, the method comprising: receiving a first data block of a data set and a plurality of additional data blocks of the data set having similar characteristics as the first data block of the data set; sampling the first data block to identify a content type of the first data block, the sampled first data block containing only a portion of the first data block; selecting automatically, by a compression management module executed by a processor, a compression module from a plurality of compression modules to apply to the first data block based on the identified content type of the first data block, wherein automatically selecting a compression module comprises: individually applying each of the plurality of compression modules to the sampled first data block, generating a plurality of compressed samples of the first data block, determining a compression efficacy for each of the compressed samples of the first data block, and selecting the compression module whose compressed sample of the first data block has a best compression efficacy, independent of whether the best compression efficacy meets a threshold amount of compression; compressing the entire first data block, and a plurality of additional data blocks of the data set having similar characteristics as the first data block of the data set, with the selected compression module to generate a first compressed data block and a plurality of compressed additional data blocks; storing the first compressed data block, and the plurality of compressed additional data blocks of the data set, in a storage device, such that a storage space of the storage device required to store the first compressed data block and the plurality of compressed additional data blocks is less than the storage space required to store the first data block and the plurality of additional data blocks; constraining or expanding the plurality of compression modules from a set of available compression modules by applying a sliding scale of trade-off between available computational resources and a computational requirement of each compression module of the set of available compression modules; analyzing a plurality of recently compressed data blocks to determine a pattern of the data blocks having been recently compressed that indicates that a different compression module of the plurality of constrained or expanded compression modules would be more efficient than the selected compression module, wherein a compression module is more efficient if it produces more data reduction with the same resources or uses fewer resources for the same amount of data reduction; and selecting the different compression module of the plurality of constrained or expanded compression modules to compress a next data block. 2. The computer-implemented method of claim 1 , further comprising: adding compression module information to compression meta-data for the data set. 3. The computer-implemented method of claim 2 , further comprising: determining an identifier for the first data block; and looking up the compression module, by a compression tracking module, using the identifier. 4. The computer-implemented method of claim 1 , further comprising: storing statistical data of the first compressed data block with statistical data for the data set. 5. The computer-implemented method of claim 1 , further comprising: scheduling further processing of the first compressed data block. 6. The computer-implemented method of claim 1 , further comprising: receiving a second data block of the data set; and selecting another compression module from the plurality of compression modules to apply to the second data block based on projected compression efficacy or resource utilization. 7. The computer-implemented method of claim 6 , further comprising: adding compression module information for the second data block to compression meta-data for the data set. 8. The computer-implemented method of claim 6 , further comprising: compressing the second data block of the data set with the another compression module to generate a second compressed data block. 9. The computer-implemented method of claim 6 , wherein the first data block and the second data block are data blocks of a single file. 10. The computer-implemented method of claim 1 , wherein the selection of the compression module for the first data block is further based on available computational resources. 11. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform a set of operations for compressing a data set, the set of operations comprising: receiving a first data block of the data set; sampling the first data block to identify a content type of the first data block, the sampled first data block containing only a portion of the first data block; selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on the identified content type of the first data block, wherein automatically selecting a compression module comprises: individually applying each of the plurality of compression modules to the sampled first data block, generating a plurality of compressed samples of the first data block, determining a compression efficacy for each of the compressed samples of the first data block, and selecting the compression module whose compressed sample of the first data block has a best compression efficacy, independent of whether the best compression efficacy meets a threshold amount of compression; compressing the entire first data block and a plurality of additional data blocks of the data set having similar characteristics as the first data block of the data set with the selected compression module to generate a first compressed data block and a plurality of compressed additional data blocks; storing the first compressed data block and the plurality of compressed additional data blocks in a storage device, such that a storage space of the storage device required to store the first compressed data block and the plurality of compressed additional data blocks is less than the storage space required to store the first data block and the plurality of additional data blocks; constraining or expanding the plurality of compression modules from a set of available compression modules by applying a sliding scale of trade-off between available computational resources and a computational requirement of each compression module of the set of available compression modules; analyzing a plurality of recently compressed data blocks to determine a pattern of the data blocks having been recently compressed that indicates that a different compression module of the plurality of constrained or expanded compression modules would be more efficient than the selected compression module, wherein a compression module is more efficient if it produces more data reduction with the same resources or uses fewer resources for the same amount of data reduction; and selecting the different compression module of the constrained or expanded compression modules to compress a next data block. 12. The non-transitory computer-readable storage medium of claim 11 , further including operations comprising: adding compression module information to compression meta-data for the data set. 13. The non-transitory computer-readable storage medium of claim 12 , further including operations comprising: determining an identifier for the first data block; and looking up the compr
using adaptive coding · CPC title
by adapting coding or compression rate · CPC title
Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title
Conversion to or from non-linear codes, e.g. companding · CPC title
Selection between different types of compressors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.