Scheduler training for multi-module byte caching

US9690711B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9690711-B2
Application numberUS-201414477093-A
CountryUS
Kind codeB2
Filing dateSep 4, 2014
Priority dateMar 13, 2013
Publication dateJun 27, 2017
Grant dateJun 27, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention provide a method, system and computer program product for dynamic caching module selection for optimized data deduplication. In an embodiment of the invention, a method for dynamic caching module selection for optimized data deduplication is provided. The method includes processing historically relevant byte streams in each of a multiplicity of byte caching modules to populate a table of associations between different classifications of the historically relevant byte streams and correspondingly optimal ones of the multiplicity of the byte caching modules. The method also includes receiving a request to retrieve data from a data source and classifying the request. The method yet further includes consulting the table to identify, from amongst the multiplicity of byte caching modules, a particular byte caching module associated with the classification of the request. Finally, the method includes deduplicating the data in the identified byte caching module.

First claim

Opening claim text (preview).

We claim: 1. A method for dynamic caching module selection for optimized data deduplication, the method comprising: processing historically relevant byte streams in each of a multiplicity of byte caching modules to populate a table of associations between different classifications of the historically relevant byte streams and correspondingly optimal ones of the multiplicity of the byte caching modules; receiving a request to retrieve data from a data source; classifying the request; consulting the table to identify, from amongst the multiplicity of byte caching modules, a particular byte caching module associated with the classification of the request; and, deduplicating the data in the identified byte caching module. 2. The method of claim 1 , further comprising: determining that an entry does not exist in the table for the classified request; dynamically classifying the request according to a statistical classification; determining from the table a particular one of the byte caching modules associated with the dynamic classification; and, deduplicating the data in the particular one of the byte caching modules. 3. The method of claim 2 , wherein the statistical classification is a naïve Bayesian classifier. 4. The method of claim 1 , further comprising updating the table with an indication of the identification of the particular byte caching module associated with the classification of the request. 5. The method of claim 2 , further comprising dynamically allocating additional memory to accommodate the table and adding an entry to the table associating the dynamically classified byte stream with the particular one of the byte caching modules. 6. The method of claim 5 , further comprising: statically allocating a fixed amount of memory to accommodate the table in memory; determining that not enough memory remains in the fixed amount of memory to accommodate the entry to be added to the table; and, responsive to the determination, evicting an entry in the table to accommodate the entry to be added to the table. 7. A data deduplication data processing system configured for dynamic caching module selection for optimized data deduplication, the system comprising: a server communicatively coupled to a data store and plurality of client computers over a computer communications network; middleware disposed between the server and the client computers and executing in memory of a host computer, the middleware comprising a multiplicity of byte caching modules, each byte caching module having a different configuration; a scheduler comprising program code executing in memory of a host computer and enabled to classify a request to retrieve data from the data store of the server, to identify, in a table, from amongst the different byte caching modules a particular byte caching module associated with the classification of the request, and to route the data for deduplication in the identified byte caching module; and, a trainer comprising program code executing in the memory of the host computer and enabled to process historically relevant byte streams in each of the different byte caching modules to populate a table of associations between different classifications of the historically relevant byte streams and correspondingly optimal ones of the byte caching modules. 8. The system of claim 7 , wherein the program code of the scheduler determines that an entry does not exist in the table for the classified request, dynamically classifies the request according to a statistical classification and determines from the table a particular one of the byte caching modules associated with the dynamic classification and deduplicates the data in the particular one of the byte caching modules. 9. The system of claim 8 , wherein the statistical classification is a naïve Bayesian classifier. 10. The system of claim 7 , wherein the program code of the scheduler updates the table with an indication of the identification of the particular byte caching module associated with the classification of the request. 11. The system of claim 8 , wherein the program code of the scheduler dynamically allocates additional memory to accommodate the table and adds an entry to the table associating the dynamically classified byte stream with the particular one of the byte caching modules. 12. The system of claim 11 , wherein the program code of the scheduler statically allocates a fixed amount of memory to accommodate the table in memory, determines that not enough memory remains in the fixed amount of memory to accommodate the entry to be added to the table and, in response to the determination, evicts an entry in the table to accommodate the entry to be added to the table. 13. A computer program product for dynamic caching module selection for optimized data deduplication, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to perform a method comprising: processing, by the device, historically relevant byte streams in each of a multiplicity of byte caching modules to populate a table of associations between different classifications of the historically relevant byte streams and correspondingly optimal ones of the multiplicity of the byte caching modules; receiving, by the device, a request to retrieve data from a data source; classifying, by the device, the request; consulting the table, by the device, to identify, from amongst the multiplicity of byte caching modules, a particular byte caching module associated with the classification of the request; and, deduplicating the data, by the device, in the identified byte caching module. 14. The computer program product of claim 13 , further comprising: determining, by the device, that an entry does not exist in the table for the classified request; dynamically classifying the request, by the device, according to a statistical classification; determining, by the device, from the table a particular one of the byte caching modules associated with the dynamic classification; and, deduplicating, by the device, the data in the particular one of the byte caching modules. 15. The computer program product of claim 14 , wherein the statistical classification is a naïve Bayesian classifier. 16. The computer program product of claim 13 , further comprising updating, by the device, the table with an indication of the identification of the particular byte caching module associated with the classification of the request. 17. The computer program product of claim 14 , further comprising adding, by the device, an entry to the table associating the dynamically classified byte stream with the particular one of the byte caching modules. 18. The computer program product of claim 17 , further comprising: statically allocating a fixed amount of memory, by the device, to accommodate the table in memory; determining, by the device, that not enough memory remains in the fixed amount of memory to accommodate the entry to be added to the table; and, responsive to the determination, by the device, evicting an entry in the table to accommodate the entry to be added to the table.

Assignees

Inventors

Classifications

  • Plural cache memories · CPC title

  • Allocation or management of cache space · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • De-duplication techniques · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9690711B2 cover?
Embodiments of the present invention provide a method, system and computer program product for dynamic caching module selection for optimized data deduplication. In an embodiment of the invention, a method for dynamic caching module selection for optimized data deduplication is provided. The method includes processing historically relevant byte streams in each of a multiplicity of byte caching …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).