Sparse files aware rolling checksum

US12130781B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12130781-B2
Application numberUS-202217731955-A
CountryUS
Kind codeB2
Filing dateApr 28, 2022
Priority dateApr 28, 2022
Publication dateOct 29, 2024
Grant dateOct 29, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A sparse files aware rolling checksum is provided by passing, in sequence, each byte of an archival file to a hash function; and in response to: detecting that a sequence of bytes from the archival file produce outputs from the hash function of zero, wherein a number of bytes in the sequence of bytes satisfies a chunk-end threshold, and determining that the sequence of bytes is located in a hole in the archival file of a greater number of bytes than the chunk-end threshold: designating a hole-chunk of the archival file that includes metadata for a location and a length of the hole in the archival file.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: passing, in sequence, each byte of an archival file stored on a source system to a hash function; and detecting that a first sequence of bytes from the archival file produces outputs from the hash function of zero, wherein a first number of bytes in the first sequence of bytes satisfies a chunk-end threshold; determining that the first sequence of bytes is located in a hole in the archival file of a greater number of bytes than the chunk-end threshold; designating a hole-chunk of the archival file that includes first metadata for a first location and a first length of the hole in the archival file; detecting that a second sequence of bytes from the archival file produces outputs from the hash function of non-zero, wherein a second number of bytes in the second sequence of bytes satisfies the chunk-end threshold; designating a data-chunk that includes second metadata of a second location and a second length of the second sequence of bytes in the archival file and a hashed value of the second sequence of bytes, wherein the second metadata configures a destination system to ignore the hole-chunk and its additional metadata during synchronization of the archival file; receiving a request to transmit the archival file to the destination system; and synchronizing the archival file with respect to the destination system by transmitting the second metadata and the archival file to the destination system by transmitting, from the source system to the destination system, the data-chunk and not transmitting the hole-chunk and the additional metadata. 2. The method of claim 1 , further comprising: detecting that a third sequence of bytes from the archival file produce outputs from the hash function of zero, wherein a third number of bytes in the third sequence of bytes satisfies the chunk-end threshold; determining that the third sequence of bytes is not located in a second hole in the archival file of a greater number of bytes than the chunk-end threshold: designating a second data-chunk that includes third metadata of a third location and a third length of the third sequence of bytes in the archival file and a second hashed value of the third sequence of bytes; and wherein transmitting the archival file to the destination system further comprises transmitting the second data-chunk despite the second data-chunk describing empty space in the archival file. 3. The method of claim 2 , wherein the second data-chunk represents unused space in the archival file less than a hole-size threshold, wherein the third sequence of bytes includes a number of bytes equal to the chunk-end threshold. 4. The method of claim 1 , wherein designating the hole-chunk further comprises compressing the first sequence of bytes and indicating in the first metadata that the hole-chunk represents empty space in the archival file; and wherein designating the data-chunk further comprises compressing the second sequence of bytes and any preceding bytes since an end of the hole and indicating in the second metadata that the data-chunk represents used space in the archival file. 5. The method of claim 1 , wherein a number of bytes included in the hole is at least ten times more than a value of the chunk-end threshold. 6. The method of claim 1 , wherein the first metadata indicates a start value of a first byte of the first sequence of bytes and an end value designated by a final byte in the hole. 7. The method of claim 1 , wherein a location of the hole in the archival file is known before passing each byte of the archival file to the hash function. 8. The method of claim 1 , wherein the destination system receives a plurality of data-chunks and a corresponding plurality of metadata for the plurality of data-chunks designated in the archival file and reproduces the archival file with identical bit spacing between each data-chunk of the plurality of data-chunks, thereby reproducing the hole in the first location having received neither the hole-chunk nor the first metadata. 9. A system, comprising: a processor; and a memory including instructions that, when executed by the processor, perform operations comprising: passing, in sequence, each byte of an archival file stored on a source system to a hash function; detecting that a first sequence of bytes from the archival file produces outputs from the hash function of zero, wherein a first number of bytes in the first sequence of bytes satisfies a chunk-end threshold, and determining that the first sequence of bytes is located in a hole in the archival file of a greater number of bytes than the chunk-end threshold; designating a hole-chunk of the archival file that includes first metadata for a first location and a first length of the hole in the archival file; compressing the first sequence of bytes and indicating in the first metadata that the hole-chunk represents empty space in the archival file; detecting that a second sequence of bytes from the archival file produces outputs from the hash function of non-zero, wherein a second number of bytes in the second sequence of bytes satisfies the chunk-end threshold; determining that the second sequence of bytes is not located in a second hole in the archival file of a greater number of bytes than the chunk-end threshold; designating a data-chunk that includes second metadata of a second location and a second length of the second sequence of bytes in the archival file and a hashed value of the second sequence of bytes, wherein the second metadata configures a destination system to ignore the hole-chunk and its additional metadata during synchronization of the archival file; and compressing the second sequence of bytes and any preceding bytes since an end of the hole and indicating in the second metadata that the data-chunk represents used space in the archival file despite representing empty space in the archival file. 10. The system of claim 9 , wherein the operations further comprise: detecting that a third sequence of bytes from the archival file produce non-zero outputs from the hash function, wherein a third number of bytes in the third sequence of bytes satisfies the chunk-end threshold, and designating a second data-chunk that includes third metadata of a third location and a third length of the third sequence of bytes in the archival file and a second hashed value of the third sequence of bytes. 11. The system of claim 9 , wherein the operations further comprise: receiving a request to transmit the archival file to the destination system; and transmitting the data-chunk and not transmitting the hole-chunk and the additional metadata, despite the data-chunk and the hole-chunk representing empty space in the archival file. 12. The system of claim 9 , wherein the data-chunk represents unused space in the archival file less than a hole-size threshold, wherein the second sequence of bytes includes a number of bytes equal to the chunk-end threshold. 13. The system of claim 9 , wherein the first metadata indicates of a start value of a first byte of the first sequence of bytes and an end value designated by a final byte in the hole. 14. The system of claim 9 , wherein the hole-chunk represents a greater number of bytes than the data-chunk, wherein the first metadata and the second metadata are represented by an equivalent number of bytes, and wherein a difference between the chunk-end threshold and a hole-size threshold reduces a compressed size of the archival file compared to a second compressed version of the archival file in which the first sequence of bytes is represented by a plurality of chunks set accordi

Assignees

Inventors

Classifications

  • G06F16/113Primary

    Details of archiving (lifecycle management in storage systems G06F3/0649; point-in-time backing up or restoration of persistent data G06F11/1446) · CPC title

  • Hash-based (content-based indexing of textual data G06F16/31) · CPC title

  • G06F16/178Primary

    Techniques for file synchronisation in file systems · CPC title

  • using compression, e.g. sparse files · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12130781B2 cover?
A sparse files aware rolling checksum is provided by passing, in sequence, each byte of an archival file to a hash function; and in response to: detecting that a sequence of bytes from the archival file produce outputs from the hash function of zero, wherein a number of bytes in the sequence of bytes satisfies a chunk-end threshold, and determining that the sequence of bytes is located in a hol…
Who is the assignee on this patent?
Red Hat Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/113. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 29 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).