Cache for efficient record lookups in an lsm data structure

US2020320081A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020320081-A1
Application numberUS-202016908006-A
CountryUS
Kind codeA1
Filing dateJun 22, 2020
Priority dateJan 30, 2018
Publication dateOct 8, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed relating to maintaining a cache usable to locate data stored in a data structure. A computer system, in various embodiments, maintains a data structure having a plurality of levels that store files for a database. The files may include one or more records that each have a key and corresponding data. The computer system may also maintain a cache for the database whose entries store, for a key, an indication of a location of a corresponding record in a file of the data structure. In some embodiments, the computer system receives a request to access a particular record stored in the data structure where the request specifies a key usable to locate the particular record. The computer system may retrieve, from the cache via the key, a particular indication of a location of the particular record and may use the particular indication to access the particular record.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: maintaining, by a computer system, a log-structured merge-tree (LSM tree) that includes a plurality of levels storing files, wherein the files include one or more records having corresponding keys and data; maintaining, by the computer system, a cache whose entries store, for corresponding keys, indications of locations of corresponding records included in the LSM tree, wherein a particular entry stores, for a first key, a first indication of a location of a first record in a first level of the LSM tree; performing, by the computer system, a merge operation in which the first record is copied into a second level of the LSM tree; and in response to performing the merge operation, the computer system updating the particular entry to store a second indication of a location of the first record in the second level of the LSM tree. 2 . The method of claim 1 , further comprising: performing, by the computer system, an atomic store operation to store, in an entry of the cache for a second key, an alias name for a file having a second record corresponding to the second key, wherein the alias name is smaller in memory size than a file name of the file. 3 . The method of claim 2 , further comprising: performing, by the computer system, a record lookup involving the second key, wherein the performing includes: retrieving, from the cache, the alias name stored in the entry that corresponds to the second key; using the alias name, identifying the file name based on a mapping between alias names and file names; and accessing the file using the file name. 4 . The method of claim 1 , further comprising: writing, by the computer system, an additional file to the LSM tree, wherein the additional file includes a set of keys and corresponding data; and subsequent to writing the additional file, the computer system invalidating those entries of the cache that correspond to a key included in the set of keys of the additional file. 5 . The method of claim 1 , further comprising: accessing, by the computer system, a second record included in the LSM tree updating, by the computer system based on a key that is associated with the second record mapping to the particular entry, the particular entry to store a third indication of a location of the second record in the LSM tree. 6 . The method of claim 1 , further comprising: invalidating, by the computer system, the particular entry in response to determining that the first record is inaccessible through reading the LSM tree. 7 . The method of claim 1 , further comprising: prior to the particular entry storing the first indication, the computer system: determining that the particular entry does not store an indication of a location of the first record: performing a search of the LSM tree for the first record; and in response to locating the first record, storing the first indication in the particular entry. 8 . A non-transitory computer readable medium having program instructions stored thereon that are capable of causing a computer system to perform operations comprising: maintaining a log-structured merge-tree (LSM tree) that includes a plurality of levels storing files, wherein the files include one or more records having corresponding keys and data; maintaining a cache whose entries store, for corresponding keys, indications of locations of corresponding records included in the LSM tree, wherein a particular entry stores, for a first key, a first indication of a location of a first record in a first level of the LSM tree; performing a merge operation in which the first record is copied into a second level of the LSM tree; and in response to performing the merge operation, updating the particular entry of the cache to store a second indication of a location of the first record in the second level of the LSM tree. 9 . The medium of claim 8 , wherein the operations further comprise: receiving a request for a second record stored within the LSM tree in response to determining that the cache does not store an indication of a location of the second record: searching the LSM tree for the second record; and in response to locating the second record within the LSM tree, storing an indication in the cache that indicates where the second record is stored within the LSM tree. 10 . The medium of claim 8 , wherein the updating includes: performing a hash derivation function using the first key to derive a hash value indicative of the particular entry to be used for storing the second indication of the location of the first record in the second level; and storing, based on the hash value, the second indication in the particular entry. 11 . The medium of claim 10 , wherein the operations further comprise: overwriting, in the particular entry of the cache, the second indication with an indication of a location of a second, different record in response to a key associated with the second record hashing to the same hash value associated with the first record. 12 . The medium of claim 11 , wherein the operations further comprise: receiving a request for the first record; accessing, based on the particular entry that corresponds to the first key associated with the first record, a record from the LSM tree; and determining whether the accessed record corresponds to the first record; and in response to determining that the accessed record corresponds to the second record, performing a search of the LSM tree for the first record. 13 . The medium of claim 8 , wherein the updating includes: performing an atomic store operation to store, in a second particular entry of the cache, an alias name of a second record of the LSM tree; and performing a lookup of the second record, including: accessing the alias name from the second particular entry of the cache; and using the alias name, identifying a file name of the second record based on a mapping between alias names and file names. 14 . The medium of claim 8 , wherein the first record is associated with a second key, wherein the operations further comprise: performing a lookup of the first record included in the LSM tree, wherein performing the lookup includes: scanning an index structure using the second key to obtain the first key; and accessing, based on the particular entry and the first key, the first record from the LSM tree. 15 . A method, comprising: in response to locating a record included in a log-structured merge-tree (LSM tree), storing, in an entry of a cache by a computer system, information that identifies a location of the record in a first level of the LSM tree; performing, by the computer system, a merge operation in which the record is copied from the first level of the LSM tree to a second, different level of the LSM tree; and in response to performing the merge operation, the computer system updating the entry of the cache to store information that identifies a location of the record within the second level of the LSM tree. 16 . The method of claim 15 , wherein the record is associated with a primary database key and a secondary database key, and wherein the storing includes: using the secondary database key, the computer system selecting the entry from a plurality of entries of the cache for storing information that identifies a location of the record. 17 . The method of claim 15 , wherein information is stored in entries of the cache using atomic store instructions. 18 . The method of claim 15 , further comprising: in response t

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020320081A1 cover?
Techniques are disclosed relating to maintaining a cache usable to locate data stored in a data structure. A computer system, in various embodiments, maintains a data structure having a plurality of levels that store files for a database. The files may include one or more records that each have a key and corresponding data. The computer system may also maintain a cache for the database whose en…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2246. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 08 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).