Field level compression in parallel data flows

US9294122B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9294122-B2
Application numberUS-201514645048-A
CountryUS
Kind codeB2
Filing dateMar 11, 2015
Priority dateApr 25, 2014
Publication dateMar 22, 2016
Grant dateMar 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment of the present invention, a system selectively compresses data fields in a parallel data flow. The system identifies within an execution plan for the parallel data flow a first instance of a data field within a stage of the parallel data flow. The system traces the identified data field through stages of the parallel data flow and determines a score value for the identified data field based on operations performed on the identified data field during traversal of the stages. The system compresses the identified data field based on the score value indicating a performance gain with respect to the compressed data field. Embodiments of the present invention further include a method and computer program product for selectively compressing data fields in a parallel data flow in substantially the same manners described above.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of selectively compressing data fields in a parallel data flow comprising: identifying within an execution plan for the parallel data flow a first instance of a data field within a stage of the parallel data flow; tracing the identified data field through stages of the parallel data flow and determining a weighted compression score value for the identified data field based on operations performed on the identified data field during traversal of the stages; and compressing the identified data field based on the weighted compression score value indicating a performance gain with respect to the compressed data field. 2. The computer-implemented method of claim 1 , wherein compressing the identified data field further comprises: compressing the identified data field based on a comparison of the weighted compression score value to a predetermined threshold. 3. The computer-implemented method of claim 1 , wherein determining a weighted compression score value further comprises: assigning weights to the operations performed on the identified data field, wherein the weighted compression score value is determined based on sums of the weighted operations for the identified data field in compressed and uncompressed states. 4. The computer-implemented method of claim 3 , further comprising: tuning the assigned weights to adapt data field compression for a plurality of data types to a system implementing the parallel data flow. 5. The computer-implemented method of claim 1 , wherein tracing the identified data field further comprises: tracing the identified data field from a first stage where the identified data field is accessed to each downstream data stage at which the identified data field is subsequently accessed. 6. The computer-implemented method of claim 5 , further comprising: selectively combining the identified data field in an uncompressed state with another uncompressed data field within the parallel data flow having common ones of the first and second stages in order to compress the combined data fields. 7. The computer-implemented method of claim 1 , further comprising: providing access to uncompressed data of the compressed data field during traversal of the stages.

Assignees

Inventors

Classifications

  • H03M7/30Primary

    Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • Electricity · mapped topic

  • Electricity · mapped topic

  • in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title

  • Protocols for data compression, e.g. ROHC · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9294122B2 cover?
According to one embodiment of the present invention, a system selectively compresses data fields in a parallel data flow. The system identifies within an execution plan for the parallel data flow a first instance of a data field within a stage of the parallel data flow. The system traces the identified data field through stages of the parallel data flow and determines a score value for the ide…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H03M7/30. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).