Memory preserving parse tree based compression with entropy coding

US11263398B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11263398-B2
Application numberUS-201916384449-A
CountryUS
Kind codeB2
Filing dateApr 15, 2019
Priority dateDec 3, 2015
Publication dateMar 1, 2022
Grant dateMar 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program product, and system includes a processor obtaining data including values and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, where the value conversion dictionary includes dictionary entries that represent the values. The processor obtains a distribution of the values and estimates a likelihood for each based on the distribution. The processor generates a code word to represent each value, a size of each code word is inversely proportional to the likelihood of the word. The processor assigns a rank to each code word, the rank for each represents the likelihood of the value represented by the code word; and based on the rank associated with each code word, the processor reorders each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, the reordered value conversion dictionary comprises an architected dictionary.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: obtaining, by a processor, data comprised of values, wherein the values are of variable size, and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, wherein the value conversion dictionary is comprised of dictionary entries that represent the values; obtaining, by the processor, a distribution of the values and estimating a likelihood for each value based on the distribution; generating, by the processor, a code word to represent each value, wherein a size of each code word is inversely proportional to the likelihood of the value represented by the code word; assigning, by the processor, a rank to each code word, wherein the rank for each code word represents the likelihood of the value represented by the code word; based on the rank associated with each code word, reordering, by the processor, each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, wherein the reordered value conversion dictionary comprises an architected dictionary; and storing, by the processor, the architected dictionary as a first tree structure and a second tree structure, wherein the storing comprises storing the first tree structure and the second tree structure separately in one or more memories, wherein the processor utilizes the first tree structure to compress data subsequently obtained by the processor and the second tree structure to decompress code words subsequently obtained by the processor, wherein the dictionary entries are of a fixed size, and wherein utilizing the architected dictionary to compress the data subsequently obtained by the processor and to decompress the code words subsequently obtained by the processor comprises locating ranks relevant to the dictionary entries. 2. The computer-implemented method of claim 1 , further comprising: obtaining, by the processor, the additional data; and compressing, by the processor, the additional data utilizing the first tree structure. 3. The computer-implemented method of claim 2 , wherein the compressing comprises walking, by the processor, from entries in the architected dictionary to the ranks. 4. The computer-implemented method of claim 1 , wherein utilizing the first tree structure to compress data subsequently obtained by the processor and utilizing the second tree structure for decompressing code words subsequently obtained by the processor do not comprise the processor performing a memory lookup. 5. The computer-implemented method of claim 1 , further comprising: obtaining, by the processor, a given code word; and decompressing, by the processor, the given code word utilizing the second tree structure. 6. The computer-implemented method of claim 5 , wherein the decompressing comprises walking, by the processor, from the ranks to the second tree structure. 7. The computer-implemented method of claim 1 , wherein the architected dictionary comprises references for each dictionary entry describing parent and child relationships associated with the dictionary entry. 8. The computer-implemented method of claim 7 , the reordering comprising: associating, by the processor, each dictionary entry with a rank; sorting, by the processor, each dictionary entry according to the rank assigned; updating, by the processor, the references for each dictionary entry; and discarding, by the processor, locations for each dictionary entry in the value conversion dictionary prior to the updating. 9. The computer-implemented method of claim 8 , the sorting further comprising, retaining, in a memory, the locations for each dictionary entry in the value conversion dictionary. 10. The computer-implemented method of claim 1 , wherein the parse tree based compression algorithm is a Ziv-Lempel compression algorithm. 11. The computer-implemented method of claim 1 , wherein the generating and the assigning comprise generating Canonical Huffman Code. 12. The computer-implemented method of claim 1 , wherein the values are of variable size. 13. A computer program product comprising: a computer readable storage medium readable by one or more processor and storing instructions for execution by the one or more processor for performing a method comprising: obtaining, by a processor, data comprised of values, wherein the values are of variable size, and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, wherein the value conversion dictionary is comprised of dictionary entries that represent the values; obtaining, by the processor, a distribution of the values and estimating a likelihood for each value based on the distribution; generating, by the processor, a code word to represent each value, wherein a size of each code word is inversely proportional to the likelihood of the value represented by the code word; assigning, by the processor, a rank to each code word, wherein the rank for each code word represents the likelihood of the value represented by the code word; based on the rank associated with each code word, reordering, by the processor, each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, wherein the reordered value conversion dictionary comprises an architected dictionary; and storing, by the processor, the architected dictionary as a first tree structure and a second tree structure, wherein the storing comprises storing the first tree structure and the second tree structure separately in one or more memories, wherein the processor utilizes the first tree structure to compress data subsequently obtained by the processor and the second tree structure to decompress code words subsequently obtained by the processor, wherein the dictionary entries are of a fixed size, and wherein utilizing the architected dictionary to compress the data subsequently obtained by the processor and to decompress the code words subsequently obtained by the processor comprises locating ranks relevant to the dictionary entries. 14. The computer program product of claim 13 , further comprising: obtaining, by the processor, the additional data; and compressing, by the processor, the additional data utilizing the first tree structure, wherein the compressing comprises walking, by the processor, from dictionary entries to ranks. 15. The computer program product of claim 13 , further comprising: obtaining, by the processor, a given code word; and decompressing, by the processor, the given code word utilizing the second tree structure, wherein the decompressing comprises walking, by the processor, from ranks to dictionary entries. 16. The computer program product of claim 13 , wherein the value conversion dictionary comprises references for each dictionary entry describing parent and child relationships associated with the dictionary entry. 17. The computer program product of claim 16 , the reordering comprising: associating, by the processor, each dictionary entry with a rank; sorting, by the processor, each dictionary entry according to the rank assigned; updating, by the processor, the references for each dictionary entry; and discarding, by the processor, locations for each dictionary entry in the value conversion dictionary prior to the updating. 18. A system comprising: one or more memories; one or more processor in communication with the one or more memories; and program instructions executable by the one or more processor via the

Assignees

Inventors

Classifications

  • H03M7/3079Primary

    Context modeling · CPC title

  • Ensuring data consistency and integrity · CPC title

  • Saving storage space on storage systems · CPC title

  • Reducing size or complexity of storage systems · CPC title

  • employing the use of a dictionary, e.g. LZ78 · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11263398B2 cover?
A method, computer program product, and system includes a processor obtaining data including values and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, where the value conversion dictionary includes dictionary entries that represent the values. The processor obtains a distribution of the values and estimates a likelihood for each based …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H03M7/3079. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).