Memory preserving parse tree based compression with entropy coding

US10303759B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10303759-B2
Application numberUS-201514958493-A
CountryUS
Kind codeB2
Filing dateDec 3, 2015
Priority dateDec 3, 2015
Publication dateMay 28, 2019
Grant dateMay 28, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program product, and system includes a processor obtaining data including values and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, where the value conversion dictionary includes dictionary entries that represent the values. The processor obtains a distribution of the values and estimates a likelihood for each based on the distribution. The processor generates a code word to represent each value, a size of each code word is inversely proportional to the likelihood of the word. The processor assigns a rank to each code word, the rank for each represents the likelihood of the value represented by the code word; and based on the rank associated with each code word, the processor reorders each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, the reordered value conversion dictionary comprises an architected dictionary.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: obtaining, by a processor, data comprised of values, wherein the values are of variable size, and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, wherein the value conversion dictionary is comprised of dictionary entries that represent the values; obtaining, by the processor, a distribution of the values and estimating a likelihood for each value based on the distribution; generating, by the processor, a code word to represent each value, wherein a size of each code word is inversely proportional to the likelihood of the value represented by the code word; assigning, by the processor, a rank to each code word, wherein the rank for each code word represents the likelihood of the value represented by the code word; based on the rank associated with each code word, reordering, by the processor, each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, wherein the reordered value conversion dictionary comprises an architected dictionary; and storing, by the processor, the architected dictionary as at least one tree structure in a memory, wherein the processor utilizes the architected dictionary to compress data subsequently obtained by the processor, comprising walking from the data, wherein the data comprises dictionary entries, to code words, without the processor performing a memory lookup, and to decompress code words subsequently obtained by the processor, comprising walking from the subsequently obtained code words to the data comprising dictionary entries, without the processor performing a memory lookup, wherein the dictionary entries are of a fixed size, and wherein utilizing the architected dictionary to compress the data subsequently obtained by the processor and to decompress the code words subsequently obtained by the processor comprises locating ranks relevant to the dictionary entries. 2. The computer-implemented method of claim 1 , further comprising: obtaining, by the processor, the additional data; and compressing, by the processor, the additional data utilizing the architected dictionary. 3. The computer-implemented method of claim 1 , further comprising: obtaining, by the processor, a given code word; and decompressing, by the processor, the given code word utilizing the architected dictionary. 4. The computer-implemented method of claim 3 , wherein the decompressing comprises walking, by the processor, from the ranks to the architected dictionary. 5. The computer-implemented method of claim 1 , wherein the architected dictionary comprises references for each dictionary entry describing parent and child relationships associated with the dictionary entry. 6. The computer-implemented method of claim 5 , the reordering comprising: associating, by the processor, each dictionary entry with a rank; sorting, by the processor, each dictionary entry according to the rank assigned; updating, by the processor, the references for each dictionary entry; and discarding, by the processor, locations for each dictionary entry in the value conversion dictionary prior to the updating. 7. The computer-implemented method of claim 6 , the sorting further comprising, retaining, in a memory, the locations for each dictionary entry in the value conversion dictionary. 8. The computer-implemented method of claim 1 , wherein the parse tree based compression algorithm is a Ziv-Lempel compression algorithm. 9. The computer-implemented method of claim 1 , wherein the generating and the assigning comprise generating Canonical Huffman Code. 10. The computer-implemented method of claim 1 , wherein the values are of variable size. 11. A computer program product comprising: a computer readable storage medium readable by one or more processor and storing instructions for execution by the one or more processor for performing a method comprising: obtaining, by a processor, data comprised of values, wherein the values are of variable size, and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, wherein the value conversion dictionary is comprised of dictionary entries that represent the values; obtaining, by the processor, a distribution of the values and estimating a likelihood for each value based on the distribution; generating, by the processor, a code word to represent each value, wherein a size of each code word is inversely proportional to the likelihood of the value represented by the code word; assigning, by the processor, a rank to each code word, wherein the rank for each code word represents the likelihood of the value represented by the code word; based on the rank associated with each code word, reordering, by the processor, each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, wherein the reordered value conversion dictionary comprises an architected dictionary; and storing, by the processor, the architected dictionary as at least one tree structure in a memory, wherein the processor utilizes the architected dictionary to compress data subsequently obtained by the processor, comprising walking from the data, wherein the data comprises dictionary entries, to code words, without the processor performing a memory lookup, and to decompress code words subsequently obtained by the processor, comprising walking from the subsequently obtained code words to the data comprising dictionary entries, without the processor performing a memory lookup, wherein the dictionary entries are of a fixed size, and wherein utilizing the architected dictionary to compress the data subsequently obtained by the processor and to decompress the code words subsequently obtained by the processor comprises locating ranks relevant to the dictionary entries. 12. The computer program product of claim 11 , further comprising: obtaining, by the processor, the additional data; and compressing, by the processor, the additional data utilizing the architected dictionary, wherein the compressing comprises walking, by the processor, from dictionary entries to ranks. 13. The computer program product of claim 11 , further comprising: obtaining, by the processor, a given code word; and decompressing, by the processor, the given code word utilizing the architected dictionary, wherein the decompressing comprises walking, by the processor, from ranks to dictionary entries. 14. The computer program product of claim 11 , wherein the value conversion dictionary comprises references for each dictionary entry describing parent and child relationships associated with the dictionary entry. 15. The computer program product of claim 14 , the reordering comprising: associating, by the processor, each dictionary entry with a rank; sorting, by the processor, each dictionary entry according to the rank assigned; updating, by the processor, the references for each dictionary entry; and discarding, by the processor, locations for each dictionary entry in the value conversion dictionary prior to the updating. 16. A system comprising: a memory; one or more processor in communication with the memory; and program instructions executable by the one or more processor via the memory to perform a method, the method comprising: obtaining, by a processor, data comprised of values, wherein the values are of variable size, and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, wherein th

Assignees

Inventors

Classifications

  • Organizing or formatting or addressing of data · CPC title

  • Ensuring data consistency and integrity · CPC title

  • Trees · CPC title

  • employing the use of a dictionary, e.g. LZ78 · CPC title

  • Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10303759B2 cover?
A method, computer program product, and system includes a processor obtaining data including values and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, where the value conversion dictionary includes dictionary entries that represent the values. The processor obtains a distribution of the values and estimates a likelihood for each based …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H03M7/3079. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 28 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).