What technology area does this patent fall under?

Primary CPC classification H03M7/3066. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Oct 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

OZIP compression and decompression

US10437781B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10437781-B2
Application number	US-201715640286-A
Country	US
Kind code	B2
Filing date	Jun 30, 2017
Priority date	Mar 19, 2014
Publication date	Oct 8, 2019
Grant date	Oct 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, apparatus, and system for OZIP, a data compression and decompression codec, is provided. OZIP utilizes a fixed size static dictionary, which may be generated from a random sampling of input data to be compressed. Compression by direct token encoding to the static dictionary streamlines the encoding and avoids expensive conditional branching, facilitating hardware implementation and high parallelism. By bounding token definition sizes and static dictionary sizes to hardware architecture constraints such as word size or processor cache size, hardware implementation can be made fast and cost effective. For example, decompression may be accelerated by using SIMD instruction processor extensions. A highly granular block mapping in optional stored metadata allows compressed data to be accessed quickly at random, bypassing the processing overhead of dynamic dictionaries. Thus, OZIP can support low latency random data access for highly random workloads, such as for OLTP systems.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: compressing a database table comprising a plurality of columns of data, by: determining a static dictionary from a portion of input data from one or more columns of data of the plurality of columns of data, the static dictionary comprising a plurality of entries up to a maximum number of dictionary entries, each of the plurality of entries mapping a token to a definition having a length up to a maximum byte size that is bounded by a hardware specification; tokenizing the one or more columns of data using the static dictionary to generate a packed sequential plurality of tokens, each of the packed sequential plurality of tokens having a fixed token size that is configured to address the maximum number of dictionary entries; and storing the static dictionary and the packed sequential plurality of tokens; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , wherein the method further comprises: repeating the determining, tokenizing, and storing for all columns of data of the plurality of columns of data. 3. The method of claim 1 , wherein the method further comprises storing metadata including a block offset mapping that indicates, for each of a plurality of sequential data blocks of a defined uncompressed block size, a token offset within the packed sequential plurality of tokens, wherein the plurality of sequential data blocks corresponds to the input data. 4. The method of claim 1 , wherein said determining comprises: searching said one or more columns of data to build a candidate dictionary having a candidate number of dictionary entries greater than the maximum number of dictionary entries, the candidate dictionary including all 1-gram entries and most frequently occurring N-gram entries, wherein N is an integer value from 2 to the maximum byte size; pruning the candidate dictionary to form the static dictionary having the maximum number of dictionary entries, wherein the pruning is configured to attempt to minimize a size of the packed sequential plurality of tokens. 5. The method of claim 1 , wherein said static dictionary further includes a frequency count for each of the plurality of entries. 6. The method of claim 1 , wherein said portion of said input data is randomly sampled from said input data. 7. The method of claim 1 , wherein the hardware specification is based on a word size of the one or more computing devices. 8. The method of claim 1 , wherein the maximum number of dictionary entries is configured such that each definition of the plurality of entries in the static dictionary can fit within a processor cache of the one or more computing devices. 9. One or more non-transitory computer-readable storage media storing instructions that, when executed by one or more processors, cause: compressing a database table comprising a plurality of columns of data, by: determining a static dictionary from a portion of input data from one or more columns of data of the plurality of columns of data, the static dictionary comprising a plurality of entries up to a maximum number of dictionary entries, each of the plurality of entries mapping a token to a definition having a length up to a maximum byte size that is bounded by a hardware specification; tokenizing the one or more columns of data using the static dictionary to generate a packed sequential plurality of tokens, each of the packed sequential plurality of tokens having a fixed token size that is configured to address the maximum number of dictionary entries; and storing the static dictionary and the packed sequential plurality of tokens. 10. The one or more non-transitory computer-readable storage media of claim 9 , wherein the instructions comprise instructions that, when executed by one or more processors, cause: repeating the determining, tokenizing, and storing for all columns of data of the plurality of columns of data. 11. The one or more non-transitory computer-readable storage media of claim 9 , wherein the instructions further comprise instructions that, when executed by one or more processors, cause storing metadata including a block offset mapping that indicates, for each of a plurality of sequential data blocks of a defined uncompressed block size, a token offset within the packed sequential plurality of tokens, wherein the plurality of sequential data blocks corresponds to the input data. 12. The one or more non-transitory computer-readable storage media of claim 9 , wherein said determining comprises: searching said one or more columns of data to build a candidate dictionary having a candidate number of dictionary entries greater than the maximum number of dictionary entries, the candidate dictionary including all 1-gram entries and most frequently occurring N-gram entries, wherein N is an integer value from 2 to the maximum byte size; pruning the candidate dictionary to form the static dictionary having the maximum number of dictionary entries, wherein the pruning is configured to attempt to minimize a size of the packed sequential plurality of tokens. 13. The one or more non-transitory computer-readable storage media of claim 9 , wherein said static dictionary further includes a frequency count for each of the plurality of entries. 14. The one or more non-transitory computer-readable storage media of claim 9 , wherein said portion of said input data is randomly sampled from said input data. 15. The one or more non-transitory computer-readable storage media of claim 9 , wherein the hardware specification is based on a word size of the one or more computing devices. 16. The one or more non-transitory computer-readable storage media of claim 9 , wherein the maximum number of dictionary entries is configured such that each definition of the plurality of entries in the static dictionary can fit within a processor cache of the one or more computing devices.

Assignees

Oracle Int Corp

Inventors

Classifications

H03M7/6011
Encoder aspects · CPC title
H03M7/3066Primary
by means of a mask or a bit-map · CPC title
H03M7/6005
Decoder aspects · CPC title
H03M7/3088
employing the use of a dictionary, e.g. LZ78 · CPC title
G06F16/1744Primary
using compression, e.g. sparse files · CPC title

Patent family

Related publications grouped by family.

View patent family 54142300

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10437781B2 cover?: A method, apparatus, and system for OZIP, a data compression and decompression codec, is provided. OZIP utilizes a fixed size static dictionary, which may be generated from a random sampling of input data to be compressed. Compression by direct token encoding to the static dictionary streamlines the encoding and avoids expensive conditional branching, facilitating hardware implementation and hi…
Who is the assignee on this patent?: Oracle Int Corp
What technology area does this patent fall under?: Primary CPC classification H03M7/3066. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Oct 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).