Using reinforcement learning to dynamically tune cache policy parameters

US11403525B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11403525-B2
Application numberUS-202016889104-A
CountryUS
Kind codeB2
Filing dateJun 1, 2020
Priority dateJun 1, 2020
Publication dateAug 2, 2022
Grant dateAug 2, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Reinforcement learning is used to dynamically tune cache policy parameters. The current state of a workload on a cache is provided to a reinforcement learning process. The reinforcement learning process uses the cache workload characterization to select an action to be taken to adjust a value of one of multiple parameterized cache policies used to control operation of a cache. The adjusted value is applied to the cache for an upcoming time interval. At the end of the time interval, a reward associated with the action is determined, which may be computed by comparing the cache hit rate during the interval with a baseline hit rate. The process iterates until the end of an episode, at which point the parameters of the cache control policies are reset. The episode is used to train the reinforcement learning policy so that the reinforcement learning process converges to a trained state.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory tangible computer readable storage medium having stored thereon a computer program for using reinforcement learning to dynamically tune cache policy parameters, the computer program including a set of instructions which, when executed by a computer, cause the computer to perform a method comprising the steps of: creating a structured state index characterizing the current state of a workload on a cache; providing the structured state index to a reinforcement learning process; using, by the reinforcement learning process, the structured state index to select an action to be taken to adjust a value of a parameterized cache policy used to control operation of a cache; using the adjusted value of the parameterized cache policy for a time interval; determining a reward associated with the action; and iterating the steps of providing the structured state index to the reinforcement process, using the structured state index to select the action, using the adjusted value, and determining the reward to create a training episode for the reinforcement learning process. 2. The non-transitory tangible computer readable storage medium of claim 1 , wherein the step of using, by the reinforcement learning process, the structured state index to select the action to be taken to adjust the value of the parameterized cache policy used to control operation of a cache comprises selecting one of at least two parameterized cache policies and selecting the action to be taken by adjusting the value of the selected one of the parameterized cache policies. 3. The non-transitory tangible computer readable storage medium of claim 2 , wherein the step of using the structured state index to select the action comprises adjusting only one parameter of one of the parameterized cache policies during each iteration. 4. The non-transitory tangible computer readable storage medium of claim 2 , wherein a first of the parameterized cache policies is a prefetch policy specifying a size of a look-ahead window for prefetching blocks of data to the cache, and a second of the parameterized cache policies is a segmentation policy specifying a ratio of the cache that is used for probatory and protected cache items. 5. The non-transitory tangible computer readable storage medium of claim 1 , wherein the step of determining the reward associated with the action comprises comparing a cache hit rate with a baseline cache hit rate. 6. The non-transitory tangible computer readable storage medium of claim 5 , wherein the reward is computed as a difference between the cache hit rate and the baseline cache hit rate. 7. The non-transitory tangible computer readable storage medium of claim 5 , wherein the reward is computed as a ratio between the cache hit rate and the baseline cache hit rate. 8. The non-transitory tangible computer readable storage medium of claim 1 , wherein the structured state index is formed as a vector, in which each element of the vector is formed from a number of accesses in a contiguous region of storage over a predetermined previous window of time. 9. The non-transitory tangible computer readable storage medium of claim 1 , wherein the reinforcement learning process is implemented using a Deep Neural Network. 10. The non-transitory tangible computer readable storage medium of claim 9 , wherein the reinforcement learning process is a Q-learning process. 11. The non-transitory tangible computer readable storage medium of claim 1 , further comprising resetting parameters of the parameterized cache policies at the end of each episode. 12. A method of using reinforcement learning to dynamically tune cache policy parameters, the method comprising the steps of: creating a structured state index characterizing the current state of a workload on a cache; providing the structured state index to a reinforcement learning process; using, by the reinforcement learning process, the structured state index to select an action to be taken to adjust a value of a parameterized cache policy used to control operation of a cache; using the adjusted value of the parameterized cache policy for a time interval; determining a reward associated with the action; and iterating the steps of providing the structured state index to the reinforcement process, using the structured state index to select the action, using the adjusted value, and determining the reward to create a training episode for the reinforcement learning process. 13. The method of claim 12 , wherein the step of using, by the reinforcement learning process, the structured state index to select the action to be taken to adjust the value of the parameterized cache policy used to control operation of a cache comprises selecting one of at least two parameterized cache policies and selecting the action to be taken by adjusting the value of the selected one of the parameterized cache policies. 14. The method of claim 13 , wherein the step of using the structured state index to select the action comprises adjusting only one parameter of one of the parameterized cache policies during each iteration. 15. The method of claim 13 , wherein a first of the parameterized cache policies is a prefetch policy specifying a size of a look-ahead window for prefetching blocks of data to the cache, and a second of the parameterized cache policies is a segmentation policy specifying a ratio of the cache that is used for probatory and protected cache items. 16. The method of claim 12 , wherein the step of determining the reward associated with the action comprises comparing a cache hit rate with a baseline cache hit rate, and computing either a difference between the cache hit rate and the baseline cache hit rate or a ratio between the cache hit rate and the baseline cache hit rate. 17. The method of claim 12 , wherein the structured state index is formed as a vector, in which each element of the vector is formed from a number of accesses in a contiguous region of storage over a predetermined previous window of time. 18. The method of claim 12 , wherein the reinforcement learning process is implemented using a Deep Neural Network. 19. The method of claim 18 , wherein the reinforcement learning process is a Q-learning process. 20. The method of claim 12 , further comprising resetting parameters of the parameterized cache policies at the end of each episode.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Feedforward networks · CPC title

  • Reinforcement learning · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11403525B2 cover?
Reinforcement learning is used to dynamically tune cache policy parameters. The current state of a workload on a cache is provided to a reinforcement learning process. The reinforcement learning process uses the cache workload characterization to select an action to be taken to adjust a value of one of multiple parameterized cache policies used to control operation of a cache. The adjusted valu…
Who is the assignee on this patent?
Emc Ip Holding Co Llc, Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F16/172. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 02 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).