Using an LSM tree file structure for the on-disk format of an object storage platform

US11093472B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11093472-B2
Application numberUS-201816213714-A
CountryUS
Kind codeB2
Filing dateDec 7, 2018
Priority dateDec 7, 2018
Publication dateAug 17, 2021
Grant dateAug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosure herein describes providing and accessing data on an object storage platform using a log-structured merge (LSM) tree file system. The LSM tree file system on the object storage platform includes sorted data tables, each sorted data table including a payload portion and an index portion. Data is written to the LSM tree file system in at least one new sorted data table. Data is ready by identifying a data location of the data based on index portions of the sorted data tables and reading the data from a sorted data table associated with the identified data location. The use of the LSM tree file system on the object storage platform provides an efficient means for interacting with the data stored thereon.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized method for accessing data on an object storage platform, the method comprising: accessing, by a processor, a log-structure merge (LSM) tree file system on the object storage platform, the LSM tree file system including a plurality of sorted data tables, each sorted data table including a payload portion configured for storing sorted key-value data tuples and an index portion for storing keys of the key-value data tuples mapped to locations of associated key-value data tuples in the payload portion, the LSM tree file system further including a catalog file configured for storing table identifiers mapped to key ranges of the plurality of sorted data tables; based on receiving a write instruction associated with a first data set, writing, by the processor, the first data set to the LSM tree file system in at least one new sorted data table and updating the catalog file to include a table identifier of the at least one new sorted data table mapped to a key range of the at least one new sorted data table; and based on receiving a read data instruction associated with a second data set, identifying, by the processor, a data location associated with the second data set based on a table identifier mapped to key ranges of the second data set in the catalog file and on index portions of the sorted data table with which the table identifier is associated and reading the second data set from a sorted data table associated with the identified data location. 2. The computerized method of claim 1 , wherein, for each sorted data table, the index portion is stored on a first object of the object storage platform and the payload portion is stored as a second object of the object storage platform. 3. The computerized method of claim 1 , wherein the object storage platform is installed on at least one server device connected to a network; the processor is installed on a client device connected to the network; and wherein the client device is configured to at least one of read data from and write data to the object storage platform using the LSM tree file system via the network. 4. The computerized method of claim 3 , further comprising: storing, by the processor, index portion copies of the index portions of the plurality of sorted data tables and a catalog file copy of the catalog file on the client device. 5. The computerized method of claim 4 , wherein identifying the data location includes identifying the data location associated with the second data set based on the stored index portion copies and the catalog file copy; and wherein reading the second data set includes requesting the second data set from the server device via the network based on the identified data location. 6. The computerized method of claim 3 , wherein writing the first data set to the LSM tree file system includes: buffering the first data set in a write data cache on the client device; and based on at least one of a cached capacity threshold being exceeded and a cache time interval expiring, sending the data in the write data cache to the server device via the network. 7. The computerized method of claim 4 , wherein the LSM tree file system is compacted on the server device based on writing the first data set to the LSM tree file system; and wherein the computerized method further comprises: receiving, by the processor from the server device, an updated catalog file copy and updated index portion copies of the LSM tree file system, wherein the updated catalog file copy and updated index portion copies are configured to reflect a structure of the LSM tree file system after being compacted; and storing, by the processor, the updated catalog file copy and updated index portion copies for use in identifying data locations in the LSM tree file system based on received read data instructions. 8. A system for storing data on an object storage platform, the system comprising: the object storage platform; a file system installed on the object storage platform, the file system configured as a log-structured merge (LSM) tree data structure; a processor; a non-transitory computer readable medium having stored thereon program code for accessing data on the object storage platform, the program code causing the processor to: access the LSM tree file system on the object storage platform, the LSM tree file system including a plurality of sorted data tables, each sorted data table including a payload portion configured for storing sorted key-value data tuples and an index portion for storing keys of the key-value data tuples mapped to locations of associated key-value data tuples in the payload portion, the LSM tree file system further including a catalog file configured for storing table identifiers mapped to key ranges of the plurality of sorted data tables; based on receiving a write instruction associated with a first data set, write the first data set to the LSM tree file system in at least one new sorted data table and update the catalog file to include a table identifier of the at least one new sorted data table mapped to a key range of the at least one new sorted data table; and based on receiving a read data instruction associated with a second data set, identify a data location associated with the second data set based on a table identifier mapped to key ranges of the second data set in the catalog file and on index portions of the sorted data table with which the table identifier is associated and read the second data set from a sorted data table associated with the identified data location. 9. The system of claim 8 , wherein, for each sorted data table, the index portion is stored on a first object of the object storage platform and the payload portion is stored as a second object of the object storage platform. 10. The system of claim 8 , wherein the object storage platform is installed on at least one server device connected to a network; the system further comprises at least one client device connected to the network; and wherein the client device is configured to at least one of read data from and write data to the object storage platform using the file system via the network. 11. The system of claim 10 , wherein the at least one client device is configured to store index portion copies of the index portions of the plurality of sorted data tables and a catalog file copy of the catalog file. 12. The system of claim 11 , wherein reading data, by the at least one client device, from the object storage platform using the file system includes: identifying a data location associated with the data to be read in the file system based on the stored index portion copies and the catalog file copy; and requesting the data to be read from the server device via the network based on the identified data location. 13. The system of claim 10 , wherein writing the first data set to the LSM tree file system includes: buffering the first data set in a write data cache on the client device; and based on at least one of a cached capacity threshold being exceeded and a cache time interval expiring, sending the data in the write data cache to the server device via the network. 14. The system of claim 11 , wherein the LSM tree file system is compacted on the server device based on writing the first data set to the LSM tree file system, and wherein the program code further causes the processor to: receive, by the client device from the server device, an updated catalog file copy and updated index portion copies of the LSM tree file system, wherein the updated catalog file copy and updated index portion copies are configured to reflect a struct

Assignees

Inventors

Classifications

  • File access structures, e.g. distributed indices (arrangements of input from, or output to, record carriers G06F3/06) · CPC title

  • Trees, e.g. B+trees · CPC title

  • Management thereof · CPC title

  • Large Object storage; Management thereof · CPC title

  • Mapping to a database · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11093472B2 cover?
The disclosure herein describes providing and accessing data on an object storage platform using a log-structured merge (LSM) tree file system. The LSM tree file system on the object storage platform includes sorted data tables, each sorted data table including a payload portion and an index portion. Data is written to the LSM tree file system in at least one new sorted data table. Data is read…
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2246. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).