Systems and methods for predicting compressibility of data
US-2016306561-A1 · Oct 20, 2016 · US
US10579591B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10579591-B1 |
| Application number | US-201615385740-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 20, 2016 |
| Priority date | Dec 20, 2016 |
| Publication date | Mar 3, 2020 |
| Grant date | Mar 3, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for performing incremental block compression using a processor are described herein. The processor receives a request to compress input data, the request including compression parameters for the compression and a target block size. The processor divides the input data into portions. The processor iteratively compresses the input data to an output block, until compressing another portion of data would increase a file size of the output block over a threshold value that is based at least on the target block size.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving a request for compression of data, the request specifying one or more compression parameters; determining a compression target block size; compressing a portion of the data to produce a set of compressed data; and as a result of determining that compressing another portion of the data to produce a compressed another portion and storing the compressed another portion of the data in an output data block would increase a file size of the output data block over a threshold value for the output data block: stopping the compression; writing the compressed portion to the output data block; and associating the output data block with metadata that provides an indication of content of the output data block. 2. The computer-implemented method of claim 1 , wherein: the request for compression further specifies a storage location for storing the output data block; and determining the compression target block size comprises determining the compression target block size based on a storage block size in the storage location. 3. The computer-implemented method of claim 1 , wherein the portion of the data is selected based at least in part on a maximum inflation ratio of a compression algorithm, the compression algorithm specified in the compression parameters. 4. The computer-implemented method of claim 1 , wherein the portion of the data is selected, based at least in part on a difference between a file size of the output data block and the compression target block size. 5. A system, comprising: a processing unit, that: receives a request indicating input data and including compression parameters; and iteratively compresses a portion of the input data to an output data block, based at least in part on the compression parameters, until compressing another portion of the input data to produce a compressed another portion and storing the compressed another portion in the output data block would increase a file size of the output data block over a threshold value for the output data block. 6. The system of claim 5 , wherein the processing unit is a compression co-processor. 7. The system of claim 5 , wherein the request indicates the threshold value. 8. The system of claim 5 , wherein the compression parameters specify a compression algorithm and one or more compression settings for the compression algorithm. 9. The system of claim 5 , wherein the request indicates input data by including a scatter-gather list, the scatter-gather list indicating one or more memory locations of the data. 10. The system of claim 5 , wherein: the input data comprises a plurality of rows from a key-value store; the portion of the input data comprises one or more rows from the plurality of rows; and each row of the one or more rows is compressed as an atomic unit. 11. The system of claim 5 , wherein the processing unit is at least one of: a central processing unit, a graphics processing unit, a field programmable gate array, a direct memory access circuit, a system-on-a-chip, or an application-specific integrated circuit. 12. The system of claim 5 , wherein the processing unit is further configured to: store the output data block into a persistent storage on a condition that compressing another portion of the input data and storing the compressed another portion in the output data block would increase a file size of the output data block over the threshold value; and associate the stored output data block with metadata that provides an indication of the portions of data compressed to the output data block. 13. A system, comprising: a processor configured to: receive a request to compress input data using incremental compression, the request specifying the input data and compression parameters; determine a compression target block size; compress a portion of the input data into an output data file; determine that a file size of the output data file would exceed the compression target block size if an additional portion of input data were compressed into the output data file; and add padding to the output data file in an amount such that the file size of the output data file is equal to the compression target block size. 14. The system of claim 13 , wherein the request specifies the compression target block size. 15. The system of claim 13 , wherein the processor is configured to determine the compression target block size based at least in part on a block size used in a storage location of the output data file. 16. The system of claim 13 , wherein the processor is configured to divide the input data into a plurality of portions of input data by determining a first portion of the input data such that the first portion of the input data has a file size that is smaller than the compression target block size. 17. The system of claim 13 , wherein: the input data comprises a plurality of atomic units; and the processor is configured to divide the input data into portions of input data, each portion of input data comprising a respective one of the atomic units. 18. The system of claim 13 , wherein the processor includes a storage location for storing a plurality of descriptors, each descriptor of the plurality of descriptors at least including a location of a corresponding block of compressed data to be retrieved from data storage and decompressed by the processor. 19. The system of claim 13 , wherein the processor is configured to: store the output data file in a persistent storage; and associate the output data file in the persistent storage with a set of metadata that provides an indication of the portions of the input data compressed to the output data file. 20. The system of claim 13 , wherein the processor is configured to: compress the portion of input data into a second output data file until a file size of the second output data file would exceed the compression target block size if an additional portion of input data were compressed into the second output data file; and add padding to the second output data file in an amount such that the file size of the second output data file is equal to the compression target block size.
Details of free space management performed by the file system (saving storage space on storage systems G06F3/0608; management of blocks in storage devices G06F3/064) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.