Dynamic caching module selection for optimized data deduplication
US-9733843-B2 · Aug 15, 2017 · US
US10241682B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10241682-B2 |
| Application number | US-201715626057-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 16, 2017 |
| Priority date | Mar 13, 2013 |
| Publication date | Mar 26, 2019 |
| Grant date | Mar 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the invention provide a method, system and computer program product for dynamic caching module selection for optimized data deduplication. In an embodiment of the invention, a method for dynamic caching module selection for optimized data deduplication is provided. The method includes receiving a request to retrieve data and classifying the request. The method also includes identifying from amongst multiple different caching modules each with a different configuration a particular caching module associated with the classification of the request. Finally, the method includes deduplicating the data in the identified caching module.
Opening claim text (preview).
We claim: 1. A method for dynamic caching module selection for optimized data deduplication, the method comprising: receiving a request to retrieve data; classifying the request according to a table correlating different requests for different ones of the caching modules, wherein the table includes entries determined by processing training data for each of the different requests in each of the caching modules and correlating each of the different requests with an optimal one of the modules; identifying from amongst multiple different caching modules each with a different configuration a particular caching module associated with the classification of the request; and, deduplicating the data in the identified caching module. 2. The method of claim 1 , wherein the caching modules include byte caching modules each configured with a different fingerprint size. 3. The method of claim 2 , wherein the caching modules additionally include an object caching module. 4. The method of claim 1 , wherein the table correlates a protocol and network address for each of the different requests with a corresponding one of the caching modules. 5. The method of claim 1 , wherein deduplication of the data is bypassed when the request indicates that data is encrypted. 6. The method of claim 1 , further comprising constructing the table by: submitting the training data to each of the modules from different servers according to different protocols, monitoring performance metrics of each of the modules and measuring the performance for throughput, processor and memory utilization and response time, submitting the measured metrics to a performance function that weights different metrics for utilization of different resources and then sums the weighted metrics into an aggregated metric, comparing the aggregate metric for a particular training data set against other aggregate metrics for the training data set in different ones of the modules, and selecting an optimal one of the modules corresponding to a classification for the training data. 7. A data deduplication data processing system configured for dynamic caching module selection for optimized data deduplication, the system comprising: a server communicatively coupled to a data store and plurality of client computers over a computer communications network; middleware disposed between the server and the client computers and executing in memory of a host computer, the middleware comprising a plurality of caching modules, each caching module having a different configuration; and, a scheduler comprising program code executing in memory of a host computer and enabled to classify a request according to a table correlating different requests for different ones of the caching modules, wherein the table includes entries determined by processing training data for each of the different requests in each of the caching modules and correlating each of the different requests with an optimal one of the modules, to retrieve data from the data store of the server, to identify from amongst the different caching modules a particular caching module associated with the classification of the request, and to route the data for deduplication in the identified caching module. 8. The system of claim 7 , wherein the caching modules include byte caching modules each configured with a different fingerprint size. 9. The system of claim 8 , wherein the caching modules additionally include an object caching module. 10. The system of claim 9 , wherein the object caching module compresses objects. 11. The system of claim 7 , wherein the table correlates a protocol and network address for each of the different requests with a corresponding one of the caching modules. 12. The system of claim 7 , wherein the program code of the scheduler bypasses deduplication of the data responsive to an indication in the scheduler to bypass deduplication of the data referenced by the request. 13. The system of claim 7 , wherein the table is constructed by: submitting the training data to each of the modules from different servers according to different protocols, monitoring performance metrics of each of the modules and measuring the performance for throughput, processor and memory utilization and response time, submitting the measured metrics to a performance function that weights different metrics for utilization of different resources and then sums the weighted metrics into an aggregated metric, comparing the aggregate metric for a particular training data set against other aggregate metrics for the training data set in different ones of the modules, and selecting an optimal one of the modules corresponding to a classification for the training data. 14. A computer program product for dynamic caching module selection for optimized data deduplication, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code for receiving a request to retrieve data; computer readable program code for classifying the request according to a table correlating different requests for different ones of the caching modules, wherein the table includes entries determined by processing training data for each of the different requests in each of the caching modules and correlating each of the different requests with an optimal one of the modules; computer readable program code for identifying from amongst multiple different caching modules each with a different configuration a particular caching module associated with the classification of the request; and, computer readable program code for deduplicating the data in the identified caching module. 15. The computer program product of claim 14 , wherein the caching modules include byte caching modules each configured with a different fingerprint size. 16. The computer program product of claim 15 , wherein the caching modules additionally include an object caching module. 17. The computer program product of claim 14 , wherein the table correlates a protocol and network address for each of the different requests with a corresponding one of the caching modules. 18. The computer program product of claim 14 , wherein deduplication of the data is bypassed when the request indicates that data is encrypted. 19. The computer program product of claim 14 , further comprising computer readable program code for constructing the table by: submitting the training data to each of the modules from different servers according to different protocols, monitoring performance metrics of each of the modules and measuring the performance for throughput, processor and memory utilization and response time, submitting the measured metrics to a performance function that weights different metrics for utilization of different resources and then sums the weighted metrics into an aggregated metric, comparing the aggregate metric for a particular training data set against other aggregate metrics for the training data set in different ones of the modules, and selecting an optimal one of the modules corresponding to a classification for the training data.
Replication mechanisms · CPC title
in combination with broadcast means (e.g. for invalidation or updating) · CPC title
Plural cache memories · CPC title
De-duplication techniques · CPC title
Remote server · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.