Memory preserving parse tree based compression with entropy coding
US-2019243894-A1 · Aug 8, 2019 · US
US11263398B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11263398-B2 |
| Application number | US-201916384449-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 15, 2019 |
| Priority date | Dec 3, 2015 |
| Publication date | Mar 1, 2022 |
| Grant date | Mar 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, computer program product, and system includes a processor obtaining data including values and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, where the value conversion dictionary includes dictionary entries that represent the values. The processor obtains a distribution of the values and estimates a likelihood for each based on the distribution. The processor generates a code word to represent each value, a size of each code word is inversely proportional to the likelihood of the word. The processor assigns a rank to each code word, the rank for each represents the likelihood of the value represented by the code word; and based on the rank associated with each code word, the processor reorders each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, the reordered value conversion dictionary comprises an architected dictionary.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: obtaining, by a processor, data comprised of values, wherein the values are of variable size, and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, wherein the value conversion dictionary is comprised of dictionary entries that represent the values; obtaining, by the processor, a distribution of the values and estimating a likelihood for each value based on the distribution; generating, by the processor, a code word to represent each value, wherein a size of each code word is inversely proportional to the likelihood of the value represented by the code word; assigning, by the processor, a rank to each code word, wherein the rank for each code word represents the likelihood of the value represented by the code word; based on the rank associated with each code word, reordering, by the processor, each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, wherein the reordered value conversion dictionary comprises an architected dictionary; and storing, by the processor, the architected dictionary as a first tree structure and a second tree structure, wherein the storing comprises storing the first tree structure and the second tree structure separately in one or more memories, wherein the processor utilizes the first tree structure to compress data subsequently obtained by the processor and the second tree structure to decompress code words subsequently obtained by the processor, wherein the dictionary entries are of a fixed size, and wherein utilizing the architected dictionary to compress the data subsequently obtained by the processor and to decompress the code words subsequently obtained by the processor comprises locating ranks relevant to the dictionary entries. 2. The computer-implemented method of claim 1 , further comprising: obtaining, by the processor, the additional data; and compressing, by the processor, the additional data utilizing the first tree structure. 3. The computer-implemented method of claim 2 , wherein the compressing comprises walking, by the processor, from entries in the architected dictionary to the ranks. 4. The computer-implemented method of claim 1 , wherein utilizing the first tree structure to compress data subsequently obtained by the processor and utilizing the second tree structure for decompressing code words subsequently obtained by the processor do not comprise the processor performing a memory lookup. 5. The computer-implemented method of claim 1 , further comprising: obtaining, by the processor, a given code word; and decompressing, by the processor, the given code word utilizing the second tree structure. 6. The computer-implemented method of claim 5 , wherein the decompressing comprises walking, by the processor, from the ranks to the second tree structure. 7. The computer-implemented method of claim 1 , wherein the architected dictionary comprises references for each dictionary entry describing parent and child relationships associated with the dictionary entry. 8. The computer-implemented method of claim 7 , the reordering comprising: associating, by the processor, each dictionary entry with a rank; sorting, by the processor, each dictionary entry according to the rank assigned; updating, by the processor, the references for each dictionary entry; and discarding, by the processor, locations for each dictionary entry in the value conversion dictionary prior to the updating. 9. The computer-implemented method of claim 8 , the sorting further comprising, retaining, in a memory, the locations for each dictionary entry in the value conversion dictionary. 10. The computer-implemented method of claim 1 , wherein the parse tree based compression algorithm is a Ziv-Lempel compression algorithm. 11. The computer-implemented method of claim 1 , wherein the generating and the assigning comprise generating Canonical Huffman Code. 12. The computer-implemented method of claim 1 , wherein the values are of variable size. 13. A computer program product comprising: a computer readable storage medium readable by one or more processor and storing instructions for execution by the one or more processor for performing a method comprising: obtaining, by a processor, data comprised of values, wherein the values are of variable size, and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, wherein the value conversion dictionary is comprised of dictionary entries that represent the values; obtaining, by the processor, a distribution of the values and estimating a likelihood for each value based on the distribution; generating, by the processor, a code word to represent each value, wherein a size of each code word is inversely proportional to the likelihood of the value represented by the code word; assigning, by the processor, a rank to each code word, wherein the rank for each code word represents the likelihood of the value represented by the code word; based on the rank associated with each code word, reordering, by the processor, each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, wherein the reordered value conversion dictionary comprises an architected dictionary; and storing, by the processor, the architected dictionary as a first tree structure and a second tree structure, wherein the storing comprises storing the first tree structure and the second tree structure separately in one or more memories, wherein the processor utilizes the first tree structure to compress data subsequently obtained by the processor and the second tree structure to decompress code words subsequently obtained by the processor, wherein the dictionary entries are of a fixed size, and wherein utilizing the architected dictionary to compress the data subsequently obtained by the processor and to decompress the code words subsequently obtained by the processor comprises locating ranks relevant to the dictionary entries. 14. The computer program product of claim 13 , further comprising: obtaining, by the processor, the additional data; and compressing, by the processor, the additional data utilizing the first tree structure, wherein the compressing comprises walking, by the processor, from dictionary entries to ranks. 15. The computer program product of claim 13 , further comprising: obtaining, by the processor, a given code word; and decompressing, by the processor, the given code word utilizing the second tree structure, wherein the decompressing comprises walking, by the processor, from ranks to dictionary entries. 16. The computer program product of claim 13 , wherein the value conversion dictionary comprises references for each dictionary entry describing parent and child relationships associated with the dictionary entry. 17. The computer program product of claim 16 , the reordering comprising: associating, by the processor, each dictionary entry with a rank; sorting, by the processor, each dictionary entry according to the rank assigned; updating, by the processor, the references for each dictionary entry; and discarding, by the processor, locations for each dictionary entry in the value conversion dictionary prior to the updating. 18. A system comprising: one or more memories; one or more processor in communication with the one or more memories; and program instructions executable by the one or more processor via the
Context modeling · CPC title
Ensuring data consistency and integrity · CPC title
Saving storage space on storage systems · CPC title
Reducing size or complexity of storage systems · CPC title
employing the use of a dictionary, e.g. LZ78 · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.