Real-time reduction of CPU overhead for data compression

US9564918B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9564918-B2
Application numberUS-201313738300-A
CountryUS
Kind codeB2
Filing dateJan 10, 2013
Priority dateJan 10, 2013
Publication dateFeb 7, 2017
Grant dateFeb 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Real-time reduction of CPU overhead for data compression is performed by a processor device in a computing environment. Non-compressing heuristics are applied on a randomly selected data sample from data sequences for determining whether to compress the data sequences. A compression potential is calculated based on the non-compressing heuristics. The compression potential is compared to a threshold value. The data sequences are either compressed if the compress threshold is matched, compressed using Huffman coding if Huffman coding threshold is matched, or stored without compression.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for real-time reduction of CPU overhead for data compression by a processor device in a computing environment, the method comprising: applying non-compressing heuristics on a randomly selected data sample from data sequences for determining whether to compress the data sequences by calculating a compression potential based on the non-compressing heuristics, wherein the compression potential is compared to a threshold value; and performing: compressing the data sequences if the compress threshold is matched, compressing the data sequences using Huffman coding if Huffman coding threshold is matched, storing the data sequences without compression, using at least one of the non-compressing heuristics, selecting a randomly selected data sample, and computing core characters that compose a predefined percentage of bytes in the randomly selected data sample, an entropy of the data, a relation between appearances of character pairs and a random distribution of the character pairs, and the entropy of the character pairs for determining whether to compress the data sequences, for real-time reduction of CPU overhead for data compression. 2. The method of claim 1 , further including calculating the non-compressing heuristics one after another for developing a heuristic score for making a decision for determining whether to compress the data sequences, wherein the CPU speed is optimized. 3. The method of claim 2 , further including calculating the compression potential based on the heuristic score of the non-compressing heuristics. 4. The method of claim 1 , further including: estimating a relation between appearances of character pairs and a random distribution of the character pairs by: comparing a number of pairs from core characters in the randomly selected data sample against the number of pairs from the core characters expected to appear in a random distribution in the randomly selected data sample, and calculating a L 2 norm distance between a first vector of distributions representing an observed distribution of the number of pairs from the core characters in the randomly selected data sample and a second vector of distributions representing an expected distribution of single values assuming there is no correlation between subsequent pairs from the core characters. 5. The method of claim 1 , further including performing one of turning off and turning on the calculating the compression potential, and determining whether to compress the data sequences according to a predefined setting. 6. The method of claim 1 , wherein the predefined setting, further includes at least one of: continuously applying the non-compressing heuristics and providing an indication as to whether to compress the data sequences, applying the non-compressing heuristics on demand when a compression ratio is above the predetermined threshold for a predefined number of the data sequences, applying the non-compressing heuristics according to a size of a buffer, and applying a prefix compression estimation and deciding whether to compress the data sequences based on a prefix compression ratio when the prefix compression ratio of the data sequences is below a threshold. 7. The method of claim 1 , further including applying the non-compressing heuristics to a relation between character pairs of core characters and random distributions of the character pairs of the core characters from the randomly selected data sample from data sequences for determining whether to compress the data sequences by calculating the compression potential based on the non-compressing heuristics. 8. A system for real-time reduction of CPU overhead for data compression in a computing environment, the system comprising: a processor device operable in the computing storage environment, wherein the processor device: applies non-compressing heuristics on a randomly selected data sample from data sequences for determining whether to compress the data sequences by calculating a compression potential based on the non-compressing heuristics, wherein the compression potential is compared to a threshold value, and performs: compressing the data sequences if the compress threshold is matched, compressing the data sequences using Huffman coding if Huffman coding threshold is matched, storing the data sequences without compression, using at least one of the non-compressing heuristics, selecting a randomly selected data sample, and computing core characters that compose a predefined percentage of bytes in the randomly selected data sample, an entropy of the data, a relation between appearances of character pairs and a random distribution of the character pairs, and the entropy of the character pairs for determining whether to compress the data sequences, for real-time reduction of CPU overhead for data compression. 9. The system of claim 8 , wherein the processor device calculates the non-compressing heuristics one after another for developing a heuristic score for making a decision for determining whether to compress the data sequences, wherein the CPU speed is optimized. 10. The system of claim 9 , wherein the processor device calculates the compression potential based on the heuristic score of the non-compressing heuristics. 11. The system of claim 8 , wherein the processor device: estimates a relation between appearances of character pairs and a random distribution of the character pairs by: comparing a number of pairs from core characters in the randomly selected data sample against the number of pairs from the core characters expected to appear in a random distribution in the randomly selected data sample, and calculating a L 2 norm distance between a first vector of distributions representing an observed distribution of the number of pairs from the core characters in the randomly selected data sample and a second vector of distributions representing an expected distribution of single values assuming there is no correlation between subsequent pairs from the core characters. 12. The system of claim 8 , wherein the processor device performs one of turning off and turning on the calculating the compression potential, and determining whether to compress the data sequences according to a predefined setting. 13. The system of claim 8 , wherein the processor device performs at least one of: continuously applying the non-compressing heuristics and providing an indication as to whether to compress the data sequences, applying the non-compressing heuristics on demand when a compression ratio is above the predetermined threshold for a predefined number of the data sequences, applying the non-compressing heuristics according to a size of a buffer, and applying a prefix compression estimation and deciding whether to compress the data sequences based on a prefix compression ratio when the prefix compression ratio of the data sequences is below a threshold. 14. The system of claim 8 , wherein the processor device applies the non-compressing heuristics to a relation between character pairs of core characters and random distributions of the character pairs of the core characters from the randomly selected data sample from data sequences for determining whether to compress the data sequences by calculating the compression potential based on the non-compressing heuristics. 15. A computer program product for real-time reduction of CPU overhead for data compression by a processor device by a processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code port

Assignees

Inventors

Classifications

  • Selection between different types of compressors · CPC title

  • according to the data type · CPC title

  • H03M7/4037Primary

    Prefix coding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9564918B2 cover?
Real-time reduction of CPU overhead for data compression is performed by a processor device in a computing environment. Non-compressing heuristics are applied on a randomly selected data sample from data sequences for determining whether to compress the data sequences. A compression potential is calculated based on the non-compressing heuristics. The compression potential is compared to a thres…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H03M7/4037. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Feb 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).