Adaptive compression with pre-filter check for compressibility to improve reads on a deduplication file system

US12153502B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12153502-B2
Application numberUS-202318365192-A
CountryUS
Kind codeB2
Filing dateAug 3, 2023
Priority dateMar 31, 2023
Publication dateNov 26, 2024
Grant dateNov 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Improving the performance of read operations in a restore path of an inline deduplication system utilizing a DDBOOST interface by providing an adaptive compression component for use with DDBOOST applications. A built-in compression mode transfers read data if there are sufficient CPU resources in the server and client to compress and decompress the read data without destabilizing the system. CPU usage is tracked to generate predicted respective client and server CPU usage. These respective predictions are compared to defined maximum threshold usage values. If the predicted values do not exceed the thresholds, compression is used, otherwise the data is transmitted over the network as non-compressed data. A pre-filter is used to first determine whether or not the data would benefit from the built-in compression mode.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of optimizing compression for reads in a restore path of a client-side inline deduplication file system, comprising: calculating a compression ratio of a read request by the server based on an average of compression ratios for each compression region comprising the read request; determining if the calculated compression ratio exceeds a first threshold; tracking CPU usage for a defined period of time; predicting future CPU usage based on the tracked usage for comparison to a second defined threshold; and applying compression to data serving the read request if the calculated compression ratio and predicted future server usage both exceed their respective first and second thresholds, otherwise sending the data from the server to the client as non-compressed data. 2. The method of claim 1 wherein the compression ratio of the average of the compression ratios comprises a weighted average factored by a size of a read from each compression ratio. 3. The method of claim 1 wherein the CPU usage comprises both server CPU and client CPU usage, and the tracked usage comprises both tracked server CPU usage and tracked client CPU usage for comparison to respective usage thresholds. 4. The method of claim 3 wherein the read operation comprises a restore operation performed by the deduplication backup system executed by a data storage server running a Data Domain File System (DDFS), and wherein a predicted server CPU usage indicates whether or not the server CPU has sufficient resources to perform the compression, and a predicted client CPU usage indicates whether or not the client CPU has sufficient resources to decompress the compressed data sent from the server, both without causing system instability. 5. The method of claim 4 further comprising deploying a Data Domain (DD) Boost file system (FS) interface (API) to access a DDBOOST library on the client hosting one or more applications generating the backup data and to perform segmentation and the reference calculating steps of a deduplication process of the DDFS, wherein the DDBOOST library is extended to the server to allow the server to access one or more functions of the DDFS, and further wherein the DDBOOST FS API presents a standard file system mount point to an application residing on the client, and wherein the application issues a read request to access a buffer in backup storage on the server. 6. The method of claim 5 wherein the compression comprises a built-in lossless compression mechanism of the DDBOOST FS, and further comprises at least one of: Lempel-Ziv (LZ), Gzip (GZ) or Huffman encoding. 7. The method of claim 1 further comprising encoding the predicted client CPU usage as metadata appended to the read request, and further wherein the CPU usage comprises a percentage amount of time that the CPU is performing non-idle work, and wherein the defined period of time is divided into a plurality of epochs, each comprising on the order of one second. 8. The method of claim 1 further comprising comparing a size of the compressed data with a size of the non-compressed data, and not applying the compression if the compared size is not smaller for the compressed data in excess of a defined compressed size amount. 9. A system for optimizing compression for reads in a restore path of a client-side inline deduplication file system, comprising: a pre-filter component calculating a compression ratio of a read request by the server based on an average of compression ratios for each compression region comprising the read request, and determining if the calculated compression ratio exceeds a first threshold; an adaptive compression component tracking CPU usage for a defined period of time; predicting future CPU usage based on the tracked usage for comparison to a second defined threshold, and applying compression to data serving the read request if the calculated compression ratio and predicted future server usage both exceed their respective first and second thresholds, otherwise sending the data from the server to the client as non-compressed data. 10. The system of claim 9 wherein the compression ratio of the average of the compression ratios comprises a weighted average factored by a size of a read from each compression ratio. 11. The system of claim 9 wherein the CPU usage comprises both server CPU and client CPU usage, and the tracked usage comprises both tracked server CPU usage and tracked client CPU usage for comparison to respective usage thresholds. 12. The system of claim 11 wherein the read operation comprises a restore operation performed by the deduplication backup system executed by a data storage server running a Data Domain File System (DDFS), and wherein a predicted server CPU usage indicates whether or not the server CPU has sufficient resources to perform the compression, and a predicted client CPU usage indicates whether or not the client CPU has sufficient resources to decompress the compressed data sent from the server, both without causing system instability. 13. The system of claim 12 further comprising a Data Domain (DD) Boost file system (FS) interface (API) to access a DDBOOST library on the client hosting one or more applications generating the backup data and to perform segmentation and the reference calculating steps of a deduplication process of the DDFS, wherein the DDBOOST library is extended to the server to allow the server to access one or more functions of the DDFS, and further wherein the DDBOOST FS API presents a standard file system mount point to an application residing on the client, and wherein the application issues a read request to access a buffer in backup storage on the server. 14. The system of claim 13 wherein the compression comprises a built-in lossless compression mechanism of the DDBOOST FS, and further comprises at least one of: Lempel-Ziv (LZ), Gzip (GZ) or Huffman encoding. 15. The system of claim 9 further comprising an encoder encoding the predicted client CPU usage as metadata appended to the read request, and further wherein the CPU usage comprises a percentage amount of time that the CPU is performing non-idle work, and wherein the defined period of time is divided into a plurality of epochs, each comprising on the order of one second. 16. The system of claim 9 further comprising a comparator comparing a size of the compressed data with a size of the non-compressed data, and not applying the compression if the compared size is not smaller for the compressed data in excess of a defined compressed size amount. 17. A tangible computer program product having stored thereon program instructions that, when executed by a process, cause the processor to perform a method of optimizing compression for reads in a restore path of a client-side inline deduplication file system, comprising: calculating a compression ratio of a read request by the server based on an average of compression ratios for each compression region comprising the read request; determining if the calculated compression ratio exceeds a first threshold; tracking CPU usage for a defined period of time; predicting future CPU usage based on the tracked usage for comparison to a second defined threshold; and applying compression to data serving the read request if the calculated compression ratio and predicted future server usage both exceed their respective first and second thresholds, otherwise sending the data from the server to the client as non-compressed data. 18. The product of claim 17 wherein the compression ratio of the average

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • using de-duplication of the data · CPC title

  • Performance evaluation by statistical analysis · CPC title

  • for performance assessment · CPC title

  • Backup restoration techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12153502B2 cover?
Improving the performance of read operations in a restore path of an inline deduplication system utilizing a DDBOOST interface by providing an adaptive compression component for use with DDBOOST applications. A built-in compression mode transfers read data if there are sufficient CPU resources in the server and client to compress and decompress the read data without destabilizing the system. CP…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F11/1469. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).