Multi-Threaded Dynamic Per-File Read-Ahead Cache for Deduplication System
US-2020333971-A1 · Oct 22, 2020 · US
US2025238375A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025238375-A1 |
| Application number | US-202418418840-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 22, 2024 |
| Priority date | Jan 22, 2024 |
| Publication date | Jul 24, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Improving extent-based read performance using pre-fetches by utilizing synthesized sequential extent files in a deduplication storage system. Extent information is obtained for changes between two generations (e.g., Generation 0 and Generation 1) of backup files. In a client-server system, the client will instruct the filesystem to create a new synthesized file corresponding to the extents. Upon receiving this request, a filesystem server will create the new synthesized file. The new synthesized file can be read sequentially to leverage the benefits of prefetching. The extents can be patched into a target file that may be stored on different storage using the extent information.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for improving extent-based read performance of a file using data pre-fetches in a client-server network, comprising: obtaining extent information for delta changes between a first generation backup file and a second generation backup file; receiving, in a filesystem server and from a client, an instruction to create a new synthesized file corresponding to the extent information; and creating, upon receiving this instruction, a new synthesized file comprising extents of the extent information as contiguous data that can be read sequentially to leverage the benefits of the pre-fetches. 2 . The method of claim 1 further comprising patching the extents into a target file that may be stored on different storage using the extent information. 3 . The method of claim 1 wherein the extent information comprises a sequence of offsets and lengths, with each offset and length pair defining a corresponding extent of data added to the first generation backup file to synthesize the second generation backup file. 4 . The method of claim 3 wherein the pre-fetches move data from the extents into a read-ahead cache to be sent to an application of the client in response to a read request, and further wherein a prefetch generated by a pre-fetch request comprises a hint that a read input/output (I/O) operation is imminent for purposes of filling the read-ahead cache and preventing a need to issue a blocking I/O operation for the read request. 5 . The method of claim 4 wherein the benefits of the pre-fetches comprise at least one of: preventing wasted input/output operations created by attempting to pre-fetch data beyond an end of an extent, or failing to pre-fetch any data at a beginning of an extent. 6 . The method of claim 3 wherein the sequence of offsets comprise an extent map, with each offset defining a corresponding extent. 7 . The method of claim 3 wherein the filesystem includes a multi-streamed restore component providing multiple streams to issue read-ahead operations for the pre-fetches in parallel, and further wherein the pre-fetches move the data into the read-ahead cache using the multiple streams. 8 . The method of claim 1 wherein the system comprises a Change Based Tracking (CBT) system, and wherein the delta changes are synthesized into a backup file stored by the application as part of a backup operation. 9 . The method of claim 1 wherein the storage comprises part of a deduplication backup process executed by a data storage server running a Data Domain filesystem (DDFS), and wherein the client comprises a DDBoost client. 10 . A computer-implemented method for improving read performance of a file using data pre-fetches in a client-server network, comprising: creating a first generation backup file of content for a first backup operation; creating a new file for changed data between the first generation backup file and a second generation backup file created by an incremental backup operation after the first backup operation, wherein the new file comprises contiguous extent data of the changed data; and synthesizing the second generation backup file by interleaving the extent data of the new file into the content of the first generation backup file, wherein the contiguous data of the new file can be read sequentially to leverage the benefits of the pre-fetches. 11 . The method of claim 10 further comprising patching the extent data into a target file that may be stored on different storage using the extent information. 12 . The method of claim 10 wherein the extent information comprises a sequence of offsets and lengths, with each offset and length pair defining a corresponding extent of data added to the first generation backup file to synthesize the second generation backup file. 13 . The method of claim 12 wherein the pre-fetches move data from the extents into a read-ahead cache to be sent to an application of the client in response to a read request, and further wherein a prefetch generated by a pre-fetch request comprises a hint that a read input/output (I/O) operation is imminent for purposes of filling the read-ahead cache and preventing a need to issue a blocking I/O operation for the read request. 14 . The method of claim 13 wherein the benefits of the pre-fetches comprise at least one of: preventing wasted input/output operations created by attempting to pre-fetch data beyond an end of an extent, or failing to pre-fetch any data at a beginning of an extent. 15 . The method of claim 13 wherein the sequence of offsets comprise an extent map, with each offset defining a corresponding extent. 16 . The method of claim 13 wherein the filesystem includes a multi-streamed restore (MSR) component providing multiple streams to issue read-ahead operations for the pre-fetches in parallel, and further wherein the pre-fetches move the data into the read-ahead cache using the multiple streams. 17 . The method of claim 10 wherein the storage comprises part of a deduplication backup process executed by a data storage server running a Data Domain filesystem (DDFS), and wherein the client comprises a DDBoost client. 18 . A system for improving read performance of a file using data pre-fetches in a client-server network, comprising: a server hosting a filesystem storing data in storage for an application executed in the network; a network client hosting the application; a backup processing component obtaining extent information for delta changes between a first generation backup file and a second generation backup file; and a server component receiving, from the network client, an instruction to create a new synthesized file corresponding to the extent information, and creating, upon receiving this instruction, a new synthesized file comprising extents of the extent information as contiguous data that can be read sequentially to leverage the benefits of the pre-fetches. 19 . The system of claim 18 further comprising a read-ahead cache, wherein the pre-fetches move data from a prefetched extents into the read-ahead cache to be sent to the application in response to the read request, and further wherein the prefetch comprises a hint that a read input/output (I/O) operation is imminent for purposes of filling the read-ahead cache and preventing a need to issue a blocking I/O operation for the read request, and yet further wherein the benefits of the pre-fetches comprise at least one of: preventing wasted input/output operations created by attempting to pre-fetch data beyond an end of an extent, or failing to pre-fetch any data at a beginning of an extent. 20 . The system of claim 19 wherein the storage comprises part of a deduplication backup process executed by a data storage server running a Data Domain filesystem (DDFS), and wherein the client comprises a DDBoost client.
using de-duplication of the data · CPC title
Backup restoration techniques · CPC title
Prefetching based on hints or prefetch instructions · CPC title
by selection of backup contents · CPC title
with prefetch · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.