Interpolation search to find arbitrary offsets in a compressed stream

US10833702B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10833702-B1
Application numberUS-201916576259-A
CountryUS
Kind codeB1
Filing dateSep 19, 2019
Priority dateSep 19, 2019
Publication dateNov 10, 2020
Grant dateNov 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Interpolated search is provided for navigating a compressed file to identify a desired offset in an uncompressed version of the file by: determining a low point and a high point in an uncompressed version of a stream corresponding to a compressed version of the stream that is divided into a plurality of chunks; calculating an average compression ratio between the low point and the high point; interpolating a position in the compressed version of a desired offset in the uncompressed version to identify a bifurcation chunk of the plurality of chunks that includes the interpolated position; reading an offset of the bifurcation chunk; and in response to determining that the desired offset is within a threshold distance of the offset of the bifurcation chunk, decompressing the compressed version from the bifurcation chunk until the desired offset is output.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: determining a low point and a high point in an uncompressed version of a stream corresponding to a compressed version of the stream that is divided into a plurality of chunks; calculating an average compression ratio between the low point and the high point; interpolating a position in the compressed version of a desired offset in the uncompressed version to identify a bifurcation chunk of the plurality of chunks that includes the interpolated position; reading an offset of the bifurcation chunk; and in response to determining that the desired offset is within a threshold distance of the offset of the bifurcation chunk, decompressing the compressed version from the bifurcation chunk until the desired offset is output. 2. The method of claim 1 , wherein each chunk of the plurality of chunks includes in a header an offset in the uncompressed stream to which that chunk includes correspondingly compressed data. 3. The method of claim 1 , further comprising, before determining the low point and the high point: determining an initial low point and an initial high point in the compressed version, wherein the initial low point is an beginning chunk of the plurality of chunks and the initial high point is a final chunk of the plurality of chunks; calculating an initial average compression ratio between the initial low point and the initial high point; interpolating an initial position in the compressed version of the desired offset in the uncompressed version to identify an initial bifurcation chunk that includes the initial interpolated position; reading an initial offset of the initial bifurcation chunk; and in response to determining that the desired offset is not within the threshold distance of the initial offset and the initial offset being closer to the initial low point than the desired offset is to the initial low point, updating the initial low point to the initial offset as a subsequent low point. 4. The method of claim 1 , further comprising, before determining the low point and the high point: determining an initial low point and an initial high point in the compressed version, wherein the initial low point is an beginning chunk of the plurality of chunks and the initial high point is a final chunk of the plurality of chunks; calculating an initial average compression ratio between the initial low point and the initial high point; interpolating an initial position in the compressed version of the desired offset in the uncompressed version to identify an initial bifurcation chunk that includes the initial interpolated position; reading an initial offset of the initial bifurcation chunk; and in response to determining that the desired offset is not within the threshold distance of the initial offset and the initial offset being closer to the initial high point than the desired offset is to the initial high point, updating the initial high point to the initial offset as a subsequent high point. 5. The method of claim 4 , wherein a final chunk of the plurality of chunks includes a padding section and each of the chunks of the plurality of chunks are the same size, wherein determining the initial average compression ignores a size of the padding section when calculating a compression ratio between the final chunk and a corresponding final portion of the uncompressed version of the stream. 6. The method of claim 1 , wherein the threshold is set to a size based on a predefined number of chunks. 7. The method of claim 1 , wherein the threshold is set based on a buffer size from which chunks of the plurality of chunks are read and a read speed from the buffer. 8. A system, comprising: a processor; and a memory storage device including instructions that when executed by the processor enable the system to: determine a low point and a high point in an uncompressed version of a stream corresponding to a compressed version of the stream that is divided into a plurality of chunks; calculate an average compression ratio between the low point and the high point; interpolate a position in the compressed version of a desired offset in the uncompressed version to identify a bifurcation chunk of the plurality of chunks that includes the interpolated position; read an offset of the bifurcation chunk; and in response to determining that the desired offset is within a threshold distance of the offset of the bifurcation chunk, decompress the compressed version from the bifurcation chunk until the desired offset is output. 9. The system of claim 8 , wherein each chunk of the plurality of chunks includes in a header an offset in the uncompressed stream to which that chunk includes correspondingly compressed data. 10. The system of claim 8 , before determining the low point and the high point the system is further enabled to: determine an initial low point and an initial high point in the compressed version, wherein the initial low point is an beginning chunk of the plurality of chunks and the initial high point is a final chunk of the plurality of chunks; calculate an initial average compression ratio between the initial low point and the initial high point; interpolate an initial position in the compressed version of the desired offset in the uncompressed version to identify an initial bifurcation chunk that includes the initial interpolated position; read an initial offset of the initial bifurcation chunk; and in response to determining that the desired offset is not within the threshold distance of the initial offset and the initial offset being closer to the initial low point than the desired offset is to the initial low point, update the initial low point to the initial offset as a subsequent low point. 11. The system of claim 8 , before determining the low point and the high point the system is further enabled to: determine an initial low point and an initial high point in the compressed version, wherein the initial low point is an beginning chunk of the plurality of chunks and the initial high point is a final chunk of the plurality of chunks; calculate an initial average compression ratio between the initial low point and the initial high point; interpolate an initial position in the compressed version of the desired offset in the uncompressed version to identify an initial bifurcation chunk that includes the initial interpolated position; read an initial offset of the initial bifurcation chunk; and in response to determining that the desired offset is not within the threshold distance of the initial offset and the initial offset being closer to the initial high point than the desired offset is to the initial high point, update the initial high point to the initial offset as a subsequent high point. 12. The system of claim 11 , wherein a final chunk of the plurality of chunks includes a padding section and each of the chunks of the plurality of chunks are the same size, wherein determining the initial average compression ignores a size of the padding section when calculating a compression ratio between the final chunk and a corresponding final portion of the uncompressed version of the stream. 13. The system of claim 8 , wherein the threshold is set to a size based on a predefined number of chunks. 14. The system of claim 8 , wherein the threshold is set based on a buffer size from which chunks of the plurality of chunks are read and a read speed from the buffer. 15. A computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code executable by one or more computer processors to: determ

Assignees

Inventors

Classifications

  • General implementation details not specific to a particular type of compression · CPC title

  • H03M7/55Primary

    Compression Theory, e.g. compression of random number, repeated compression · CPC title

  • H03M7/3064Primary

    Segmenting · CPC title

  • Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10833702B1 cover?
Interpolated search is provided for navigating a compressed file to identify a desired offset in an uncompressed version of the file by: determining a low point and a high point in an uncompressed version of a stream corresponding to a compressed version of the stream that is divided into a plurality of chunks; calculating an average compression ratio between the low point and the high point; i…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H03M7/55. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Nov 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).