Method for maximum data reduction combining compression with deduplication in storage arrays
US-2020019329-A1 · Jan 16, 2020 · US
US2024330127A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024330127-A1 |
| Application number | US-202318365192-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 3, 2023 |
| Priority date | Mar 31, 2023 |
| Publication date | Oct 3, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Improving the performance of read operations in a restore path of an inline deduplication system utilizing a DDBOOST interface by providing an adaptive compression component for use with DDBOOST applications. A built-in compression mode transfers read data if there are sufficient CPU resources in the server and client to compress and decompress the read data without destabilizing the system. CPU usage is tracked to generate predicted respective client and server CPU usage. These respective predictions are compared to defined maximum threshold usage values. If the predicted values do not exceed the thresholds, compression is used, otherwise the data is transmitted over the network as non-compressed data. A pre-filter is used to first determine whether or not the data would benefit from the built-in compression mode.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method of optimizing compression for reads in a restore path of a client-side inline deduplication file system, comprising: calculating a compression ratio of a read request by the server based on an average of compression ratios for each compression region comprising the read request; determining if the calculated compression ratio exceeds a first threshold; tracking CPU usage for a defined period of time; predicting future CPU usage based on the tracked usage for comparison to a second defined threshold; and applying compression to data serving the read request if the calculated compression ratio and predicted future server usage both exceed their respective first and second thresholds, otherwise sending the data from the server to the client as non-compressed data. 2 . The method of claim 1 wherein the compression ratio of the average of the compression ratios comprises a weighted average factored by a size of a read from each compression ratio. 3 . The method of claim 1 wherein the CPU usage comprises both server CPU and client CPU usage, and the tracked usage comprises both tracked server CPU usage and tracked client CPU usage for comparison to respective usage thresholds. 4 . The method of claim 3 wherein the read operation comprises a restore operation performed by the deduplication backup system executed by a data storage server running a Data Domain File System (DDFS), and wherein a predicted server CPU usage indicates whether or not the server CPU has sufficient resources to perform the compression, and a predicted client CPU usage indicates whether or not the client CPU has sufficient resources to decompress the compressed data sent from the server, both without causing system instability. 5 . The method of claim 4 further comprising deploying a Data Domain (DD) Boost file system (FS) interface (API) to access a DDBOOST library on the client hosting one or more applications generating the backup data and to perform segmentation and the reference calculating steps of a deduplication process of the DDFS, wherein the DDBOOST library is extended to the server to allow the server to access one or more functions of the DDFS, and further wherein the DDBOOST FS API presents a standard file system mount point to an application residing on the client, and wherein the application issues a read request to access a buffer in backup storage on the server. 6 . The method of claim 5 wherein the compression comprises a built-in lossless compression mechanism of the DDBOOST FS, and further comprises at least one of: Lempel-Ziv (LZ), Gzip (GZ) or Huffman encoding. 7 . The method of claim 1 further comprising encoding the predicted client CPU usage as metadata appended to the read request, and further wherein the CPU usage comprises a percentage amount of time that the CPU is performing non-idle work, and wherein the defined period of time is divided into a plurality of epochs, each comprising on the order of one second. 8 . The method of claim 1 further comprising comparing a size of the compressed data with a size of the non-compressed data, and not applying the compression if the compared size is not smaller for the compressed data in excess of a defined compressed size amount. 9 . A system for optimizing compression for reads in a restore path of a client-side inline deduplication file system, comprising: a pre-filter component calculating a compression ratio of a read request by the server based on an average of compression ratios for each compression region comprising the read request, and determining if the calculated compression ratio exceeds a first threshold; an adaptive compression component tracking CPU usage for a defined period of time; predicting future CPU usage based on the tracked usage for comparison to a second defined threshold, and applying compression to data serving the read request if the calculated compression ratio and predicted future server usage both exceed their respective first and second thresholds, otherwise sending the data from the server to the client as non-compressed data. 10 . The system of claim 9 wherein the compression ratio of the average of the compression ratios comprises a weighted average factored by a size of a read from each compression ratio. 11 . The system of claim 9 wherein the CPU usage comprises both server CPU and client CPU usage, and the tracked usage comprises both tracked server CPU usage and tracked client CPU usage for comparison to respective usage thresholds. 12 . The system of claim 11 wherein the read operation comprises a restore operation performed by the deduplication backup system executed by a data storage server running a Data Domain File System (DDFS), and wherein a predicted server CPU usage indicates whether or not the server CPU has sufficient resources to perform the compression, and a predicted client CPU usage indicates whether or not the client CPU has sufficient resources to decompress the compressed data sent from the server, both without causing system instability. 13 . The system of claim 12 further comprising a Data Domain (DD) Boost file system (FS) interface (API) to access a DDBOOST library on the client hosting one or more applications generating the backup data and to perform segmentation and the reference calculating steps of a deduplication process of the DDFS, wherein the DDBOOST library is extended to the server to allow the server to access one or more functions of the DDFS, and further wherein the DDBOOST FS API presents a standard file system mount point to an application residing on the client, and wherein the application issues a read request to access a buffer in backup storage on the server. 14 . The system of claim 13 wherein the compression comprises a built-in lossless compression mechanism of the DDBOOST FS, and further comprises at least one of: Lempel-Ziv (LZ), Gzip (GZ) or Huffman encoding. 15 . The system of claim 9 further comprising an encoder encoding the predicted client CPU usage as metadata appended to the read request, and further wherein the CPU usage comprises a percentage amount of time that the CPU is performing non-idle work, and wherein the defined period of time is divided into a plurality of epochs, each comprising on the order of one second. 16 . The system of claim 9 further comprising a comparator comparing a size of the compressed data with a size of the non-compressed data, and not applying the compression if the compared size is not smaller for the compressed data in excess of a defined compressed size amount. 17 . A tangible computer program product having stored thereon program instructions that, when executed by a process, cause the processor to perform a method of optimizing compression for reads in a restore path of a client-side inline deduplication file system, comprising: calculating a compression ratio of a read request by the server based on an average of compression ratios for each compression region comprising the read request; determining if the calculated compression ratio exceeds a first threshold; tracking CPU usage for a defined period of time; predicting future CPU usage based on the tracked usage for comparison to a second defined threshold; and applying compression to data serving the read request if the calculated compression ratio and predicted future server usage both exceed their respective first and second thresholds, otherwise sending the data from the server to the client as non-compressed data. 18 . The product of c
Performance evaluation by statistical analysis · CPC title
using de-duplication of the data · CPC title
for performance assessment · CPC title
Backup restoration techniques · CPC title
Data deduplication · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.