Query predicate evaluation and computation for hierarchically compressed data

US10909078B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10909078-B2
Application numberUS-201514630813-A
CountryUS
Kind codeB2
Filing dateFeb 25, 2015
Priority dateFeb 25, 2015
Publication dateFeb 2, 2021
Grant dateFeb 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to embodiments of the present invention, machines, systems, methods and computer program products for processing data are provided. Compressed data is received and a requested operation for uncompressed data is performed on the compressed data by determining an intermediate location in a compression hierarchy of compression nodes and applying the requested operation to the data at that intermediate location.

First claim

Opening claim text (preview).

What is claimed is: 1. A system of processing data comprising: a compression hierarchy including a plurality of compression nodes performing different types of data compression, wherein data from table columns of a database is compressed by a first compression node and is subsequently compressed by one or more subsequent compression nodes in the compression hierarchy; and at least one hardware processor configured to: receive a requested operation for uncompressed data within a query predicate to evaluate the query predicate for a chunk of data from one or more of the table columns of the database, wherein the requested operation is a computation on data in the chunk for the query predicate and is applicable to compressed data produced by a corresponding type of compression, and wherein the corresponding type of compression generates the compressed data comprising a plurality of parameters including a value parameter specifying a subset of values of data in the chunk and an attribute parameter indicating an arrangement of the subset of values for generating the uncompressed data, and wherein the corresponding type of compression includes one or more from a group of: run length encoding and dictionary encoding; push evaluation of the query predicate through the compression hierarchy by determining an intermediate location in the compression hierarchy of compression nodes to apply the requested operation to compressed data of the chunk, wherein determining the intermediate location comprises: evaluating a type of compression and characteristics of compressed data produced by each of the plurality of compressions nodes; and determining the intermediate location in the compression hierarchy providing compressed data with the corresponding type of compression for the requested operation; apply the requested operation to the value parameter of compressed data produced by a compression node proximate the determined location to produce result values without decompressing the compressed data, wherein the result values include the attribute parameters from the compressed data of the compression node proximate the determined location and the value parameters from the compressed data of the compression node proximate the determined location modified by application of the requested operation; apply the result values to a subsequent compression node in the compression hierarchy to produce compressed data having the requested operation applied; and evaluate the query predicate on the chunk of data with the requested operation applied and retrieve and decompress data satisfying the query from the database. 2. The system of claim 1 , wherein the at least one hardware processor is further configured to: perform the operation, in parallel, on streams of data values of compressed data of the compression node proximate the determined location to produce the result values; and apply the result values to the subsequent compression node to produce compressed data having the operation applied. 3. The system of claim 1 , wherein the query predicate includes a plurality of different parameters, and applying the requested operation further comprises: transforming one or more parameters to provide the plurality of parameters in a common type as the compressed data of the compression node proximate the determined location; and wherein evaluating the query predicate is based on the transformed parameters. 4. The system of claim 1 , wherein the computation of the requested operation includes a unary computation having a function with a single argument, and the at least one hardware processor is further configured to: determine that the compressed data of the compression node proximate the determined location includes a number (N) of the same values; and perform the unary computation on a single value of the compressed data of the compression node proximate the determined location to generate a data set containing N copies of a result of the unary computation. 5. The system of claim 1 , wherein the computation of the requested operation includes a binary computation having a function with first and second variables, and the at least one hardware processor is further configured to: determine that the compressed data of the compression node proximate the determined location includes data for performing the binary computation, the compressed data of the compression node proximate the determined location comprising data for the first and second variables; determine whether each value of data for the first variable is the same; and perform the binary computation on the compressed data of the compression node proximate the determined location by treating the binary computation as a unary computation, wherein the first variable of the binary computation is bound to the determined same value of each value of the data for the first variable and the unary computation includes a function having a single argument pertaining to the second variable of the binary computation. 6. The system of claim 5 , wherein the data for the first and second variables have been compressed according to a dictionary encoded compression scheme or a run length encoding compression scheme, and the at least one hardware processor is further configured to: perform the requested operation on the compressed data of the compression node proximate the determined location, wherein the compressed data of the compression node proximate the determined location has been further compressed to reduce an amount of computation. 7. The system of claim 1 , wherein the at least one hardware processor is further configured to: determine that the compressed data of the compression node proximate the determined location comprises data for the computation, wherein the computation includes a function having at least two variables and the compressed data for at least one of the two variables is compressed via a compression scheme using unary encoding, and wherein unary encoding comprises encoding by a function having a single argument; determine a compatible encoding for the at least two variables, wherein the compatible encoding is performed by a compression node in the compression hierarchy subsequent the compression node proximate the determined location providing the compressed data using the unary encoding, and wherein the compatible encoding corresponds to one of: run length encoding and dictionary encoding; remove the unary encoding from the compressed data of the subsequent compression node performing the compatible encoding and perform the requested operation on the compressed data in the compatible encoding. 8. The system of claim 1 , wherein the requested operation includes evaluation of data that is of a floating point type or of a character type, and applying the requested operation further comprises: transforming the data in the floating point type or the character type into an integer data type; and wherein evaluating the query predicate is based on the transformed data. 9. The system of claim 1 , wherein the at least one hardware processor is further configured to: perform the requested operation on the compressed data of the compression node proximate the determined location, wherein the computation is performed for each distinct data value within the compressed data of the compression node proximate the determined location. 10. The system of claim 1 , wherein determining the intermediate location further comprises: in response to the first compression node and one or more subsequent compression nodes lacking the corresponding type of compression for applying the requested operation to compressed data, compressing the data from the table columns of the database

Assignees

Inventors

Classifications

  • using compression, e.g. sparse files · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10909078B2 cover?
According to embodiments of the present invention, machines, systems, methods and computer program products for processing data are provided. Compressed data is received and a requested operation for uncompressed data is performed on the compressed data by determining an intermediate location in a compression hierarchy of compression nodes and applying the requested operation to the data at tha…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/1744. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).