Data compression algorithm selection and tiering

US10387375B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10387375-B2
Application numberUS-201715398859-A
CountryUS
Kind codeB2
Filing dateJan 5, 2017
Priority dateMay 28, 2008
Publication dateAug 20, 2019
Grant dateAug 20, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data storage subsystem having a plurality of data compression engines configured to compress data, each having a different compression algorithm. A data handling system is configured to determine a present rate of access to data; select at least one sample of data; determine the greatest degree of compression of said data compression engines; determine the compression ratios of the operated data compression engines with respect to the selected sample(s); compressing said selected at least one sample with a plurality of said data compression engines at said selected tier; operate a selected data compression engines with respect to the selected sample and determine the greatest degree of compression of the data compression engines; compress the data from which the sample was selected with one of the operated data compression engines determined to have the greatest degree of compression; and store the compressed data in data storage repositories.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for selectively compressing data for a data storage system having a plurality of data compression engines, each having a different compression algorithm, comprising the steps of: determining a present rate of access to data; selecting at least one sample of said data; determining a greatest degree of compression of a plurality of data compression engines with respect to said selected at least one sample; compressing said selected at least one sample with a plurality of said data compression engines at a selected tier; operating said selected data compression engines with respect to said selected at least one sample and determining the greatest degree of compression of said data compression engines from said operation of said data compression engines with respect to said selected at least one sample; compressing said data from which said at least one sample was selected with the one of said operated data compression engines determines to have said greatest degree of compression with respect to said selected at least one sample; and storing said compressed data in data storage repositories associated with the data compression engine employed to compress said data; wherein said step of compressing said data comprises compressing said data for a parent category repository, and recompressing said data for a child category repository. 2. The method of claim 1 , further including performing at least one of: if said rate of access indicates said data is to be compressed, selecting a tier of data compression engines with respect to said data that is inverse to said present rate of access; randomly selecting at least one sample of said data to be compressed and stored; determining compression ratios of said data engines from said operation of said data compression engines with respect to said selected at least one sample; arranging said plurality of data compression engines in a plurality of tiers from low to high in accordance with expected latency to compress data and to uncompress compressed data; and moving data between said parent and said child category, wherein at least two of said repositories are classified into parent and child categories, each at a different said tier, said parent having a lesser degree of compression than said child, and said computer program product computer readable program code, when executed on a computer processing system, causes said computer processing system to additionally move data between said parent and said child category repositories in accordance with the inverse of said present rate of access. 3. The method of claim 1 , wherein said present rate of access to said data comprises the inactivity time from the most recent access to at least a portion of said data, the less inactivity time, the greater the rate of access. 4. The method of claim 1 , wherein said present rate of access to said data comprises the number of accesses to at least a portion of said data within a time window, the greater the number of accesses, the greater the rate of access. 5. The method of claim 1 , further including the step of: compressing said data with the one of said data compression engines determined to have said greatest degree of compression with respect to said selected at least one sample, and determining the compression ratios of said operated data compression engines with respect to said selected at least one sample. 6. The method of claim 1 , wherein said step of compressing said data comprises compressing said data for said parent category repository, and uncompressing and again compressing said data for said child category repository. 7. The method of claim 1 , additionally comprising the steps of determining whether none of said plurality of compression engines exceeds a minimum degree of compression; and, if so, disallowing compression of said data. 8. A data storage subsystem, comprising: data storage; a plurality of data compression engines configured to compress data, each having a different compression algorithm, said compression engines arranged in a plurality of tiers from low to high in accordance with expected latency to compress data and to uncompress compressed data; at least one input configured to receive data to be compressed and stored by said data storage; and at least one data handling system configured to perform steps comprising: determining a present rate of access to data; selecting at least one sample of said data; determining a greatest degree of compression of said data compression engines with respect to said selected at least one sample; compressing said selected at least one sample with a plurality of said data compression engines at a selected tier; operating said selected data compression engines with respect to said selected at least one sample and determining the greatest degree of compression of said data compression engines from said operation of said data compression engines with respect to said selected at least one sample; compressing said data from which said at least one sample was selected with the one of said operated data compression engines determines to have said greatest degree of compression with respect to said selected at least one sample; and storing said compressed data in data storage repositories associated with the data compression engine employed to compress said data; wherein said step of compressing said data comprises compressing said data for a parent category repository, and recompressing said data for a child category repository. 9. The data storage subsystem of claim 8 , wherein said data handling system is configured to: if said rate of access indicates said data is to be compressed, selecting a tier of data compression engines with respect to said data that is inverse to said present rate of access; randomly select at least one sample of said data to be compressed and stored; determine compression ratios of said data engines from said operation of said data compression engines with respect to said selected at least one sample, arrange said plurality of data compression engines in a plurality of tiers from low to high in accordance with expected latency to compress data and to uncompress compressed data, and move data between said parent and said child category, wherein at least two of said repositories are classified into parent and child categories, each at a different said tier, said parent having a lesser degree of compression than said child, and said computer program product computer readable program code, when executed on a computer processing system, causes said computer processing system to additionally move data between said parent and said child category repositories in accordance with the inverse of said present rate of access. 10. The data storage subsystem of claim 8 , wherein said present rate of access to said data comprises the inactivity time from the most recent access to at least a portion of said data, the less inactivity time, the greater the rate of access. 11. The data storage subsystem of claim 8 , wherein said present rate of access to said data comprises the number of accesses to at least a portion of said data within a time window, the greater the number of accesses, the greater the rate of access. 12. The data storage subsystem of claim 8 , wherein said data handling system is additionally configured to perform the step of: compressing said data with the one of said data compression engines determined to have said greatest degree of compression with respect to said selected at least one sample, and determining the compression ratios of said operated data compression engines with respect to

Assignees

Inventors

Classifications

  • Intermediate data storage techniques for performance improvement · CPC title

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • Organizing or formatting or addressing of data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10387375B2 cover?
A data storage subsystem having a plurality of data compression engines configured to compress data, each having a different compression algorithm. A data handling system is configured to determine a present rate of access to data; select at least one sample of data; determine the greatest degree of compression of said data compression engines; determine the compression ratios of the operated d…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).