Auto-tuned write-optimized key-value store

US11093450B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11093450-B2
Application numberUS-201715717704-A
CountryUS
Kind codeB2
Filing dateSep 27, 2017
Priority dateSep 27, 2017
Publication dateAug 17, 2021
Grant dateAug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A B ε -tree associated with a file system on a storage volume includes a hierarchy of nodes. Each node includes a buffer portion to store key-value pairs as messages in the buffer. Each node can be characterized by having a maximum allowable size that is periodically updated at run time. The buffers in the nodes of the B ε -tree are therefore characterized by having a maximum allowed size that can vary over time.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for a file system on a storage volume, the method comprising: receiving modify operations and query operations on a tree data structure of the file system, the tree data structure comprising a plurality of nodes organized in a hierarchy, each node having a buffer comprising one or more disk blocks allocated from the storage volume for storing data, the size of each node varying depending on the amount of data stored in its buffer; counting received modify operations and received query operations for a predetermined period of time to generate a count of received modify operations and a count of received query operations; updating a value of an expected node size parameter (B exp ) to generate an updated value of B exp , including computing the value of B exp based on the count of received modify operations and the count of received query operations received during the predetermined period of time; processing a first modify operation by inserting write data associated with the first modify operation to the buffer in a root node of the tree data structure; triggering a cascade of data movement to one or more children nodes descending from the root node by flushing data in the buffer of the root node to a child node of the root node when an amount of data stored in the buffer of the root node exceeds a threshold value that is computed using the updated value of B exp ; and enforcing the updated value of B exp on any children nodes visited during the cascade of data movement including flushing at least some data in the buffer of a visited child node when the size of the visited child node exceeds the updated value of B exp . 2. The method of claim 1 , further comprising deallocating any disk blocks comprising the buffer in the visited child node that become unused as a result of flushing the data. 3. The method of claim 1 , wherein enforcing B exp on any children nodes visited during the cascade of data movement further includes growing the buffer of a visited child to store flushed data received from a parent node so long as the size of the visited child remains less than the updated value of B exp , including allocating one or more disk blocks from the storage volume to the buffer. 4. The method of claim 1 , further comprising aging the number of modify operations and the number of query operations using an aging factor. 5. The method of claim 1 , wherein updating the value of B exp is based on a cost metric that is computed as a function of a number of modify operations and query operations. 6. The method of claim 1 , further comprising computing a cost metric for a plurality of candidate node sizes based on the updated value of B exp and updating the value of B exp using the candidate node size associated with the lowest cost metric. 7. The method of claim 6 , wherein the plurality of candidate node sizes include B exp +Δ, B exp , and B exp −Δ. 8. The method of claim 1 , further comprising deallocating any disk blocks comprising the buffer of the root node that become unused as a result of flushing the data. 9. A non-transitory computer-readable storage medium having stored thereon computer executable instructions, which when executed by a computer device, cause the computer device to: receive modify operations and query operations on a tree data structure of the file system, the tree data structure comprising a plurality of nodes organized in a hierarchy, each node having a buffer comprising one or more disk blocks allocated from the storage volume for storing data, the size of each node varying depending on the amount of data stored in its buffer; count received modify operations and received query operations for a predetermined period of time to generate a count of received modify operations and a count of received query operations; update a value of an expected node size parameter (B exp ) to generate an updated value of B exp , including computing the value of B exp based on the count of received modify operations and the count of received query operations received during the predetermined period of time; process a first modify operation by inserting write data associated with the first modify operation to the buffer in a root node of the tree data structure; trigger a cascade of data movement to one or more children nodes descending from the root node by flushing data in the buffer of the root node to a child node of the root node when an amount of data stored in the buffer of the root node exceeds a threshold value that is computed using the updated value of B exp ; and enforce the updated value of B exp on any children nodes visited during the cascade of data movement including flushing at least some data in the buffer of a visited child node when the size of the visited child node exceeds the updated value of B exp . 10. The non-transitory computer-readable storage medium of claim 9 , wherein the computer executable instructions, which when executed by the computer device, further cause the computer device to deallocate any disk blocks comprising the buffer in the visited child node that become unused as a result of flushing the data. 11. The non-transitory computer-readable storage medium of claim 9 , wherein the computer executable instructions, which when executed by the computer device, further cause the computer device to age the number of modify operations and the number of query operations using an aging factor. 12. The non-transitory computer-readable storage medium of claim 9 , wherein updating the value of B exp is based on a cost metric that is computed as a function of a number of modify operations and query operations. 13. The non-transitory computer-readable storage medium of claim 9 , wherein the computer executable instructions, which when executed by the computer device, further cause the computer device to compute a cost metric for a plurality of candidate node sizes based on the updated value of B exp and updating the value of B exp using the candidate node size associated with the lowest cost metric. 14. An apparatus comprising: one or more computer processors; and a computer-readable storage medium comprising instructions for controlling the one or more computer processors to be operable to: receive modify operations and query operations on a tree data structure of the file system, the tree data structure comprising a plurality of nodes organized in a hierarchy, each node having a buffer comprising one or more disk blocks allocated from the storage volume for storing data, the size of each node varying depending on the amount of data stored in its buffer; count received modify operations and received query operations for a predetermined period of time to generate a count of received modify operations and a count of received query operations; update a value of an expected node size parameter (B exp ) to generate an updated value of B exp , including computing the value of B exp based on the count of received modify operations and the count of received query operations received during the predetermined period of time; process a first modify operation by inserting write data associated with the first modify operation to the buffer in a root node of the tree data structure; trigger a cascade of data movement to one or more children nodes descending from the root node by flushing data in the buffer of the root node to a child node of the root node when an amount of data stored in the buffer of the root node exceeds a threshold value that is computed using the updated value of B exp ; and enforce the updated value of B exp on any children nodes visited dur

Assignees

Inventors

Classifications

  • Migration mechanisms · CPC title

  • Data buffering arrangements · CPC title

  • Details of free space management performed by the file system (saving storage space on storage systems G06F3/0608; management of blocks in storage devices G06F3/064) · CPC title

  • Management of blocks · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11093450B2 cover?
A B ε -tree associated with a file system on a storage volume includes a hierarchy of nodes. Each node includes a buffer portion to store key-value pairs as messages in the buffer. Each node can be characterized by having a maximum allowable size that is periodically updated at run time. The buffers in the nodes of the B ε -tree are therefore characterized by having a maximum allowed size that …
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1727. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).