LSM cache

US9846711B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9846711-B2
Application numberUS-201213730698-A
CountryUS
Kind codeB2
Filing dateDec 28, 2012
Priority dateDec 28, 2012
Publication dateDec 19, 2017
Grant dateDec 19, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A variety of methods for improving efficiency in a database system are provided. In one embodiment, a method may comprise: generating multiple levels of data according to how recently the data have been updated, whereby most recently updated data are assigned to the newest level; storing each level of data in a specific storage tier; splitting data stored in a particular storage tier into two or more groups according to access statistics of each specific data; during compaction, storing data from different groups in separate data blocks of the particular storage tier; and when a particular data in a specific data block is requested, reading the specific data block into a low-latency storage tier.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for storing data in a database, comprising: generating multiple levels of data according to how recently the data have been updated, whereby most recently updated data are assigned to a newest level; storing at least some of the levels of data in a specified storage tier of multiple storage tiers according to at least the latency of the storage tiers; splitting data stored in a specified storage level of the specified storage tier into multiple data groups according to access statistics of data, wherein a first data group of the data groups includes data whose access statistics satisfy a criterion and a second data group of the data groups includes data whose access statistics do not satisfy the criterion, the specified storage level including multiple data blocks, wherein each data block of a set of the data blocks includes at least some of the data from the first data group and at least some of the data from the second data group, the data groups ranked into the levels according to access frequencies, the splitting further including: storing data having the highest access frequency among the data groups to a particular data block in one of the levels that contains data with the highest access frequency; during compaction, generating a first data by extracting, from each data block of the set of the data blocks, data that belongs to the first data group, and storing the first data in one or more data blocks of the specified storage level, wherein a specified data block of the one or more data blocks includes a set of data records whose access statistics satisfy the criterion; receiving a request for a specified data record of the set of data records; reading all of the set of data records into a low-latency storage tier in response to the request for the specified record; and generating a log update specifying the first data extracted from the set of data blocks during compaction and appending the log update to a log file located in the specified storage level. 2. The computer-implemented method as recited in claim 1 , wherein the specified storage tier is a low-latency storage tier, an intermediate-latency storage tier or a high-latency storage tier. 3. The computer-implemented method as recited in claim 1 , further comprising: storing separate data to a particular data block in at least one of the tiers if the separate data are more likely to be accessed together according to statistical analysis. 4. The computer-implemented method as recited in claim 1 , further comprising: assigning an m-bit Bloom filter with k hash functions for each of a subset of the storage tiers, with all bits set to 0; when a new data is added in a given storage tier, getting k array positions by feeding the new data to each of the k hash functions; and setting the k array positions of the m-bit Boom filter to 1. 5. The computer-implemented method as recited in claim 1 , further comprising: assigning an m-bit Bloom filter with one hash function for each of a subset of the storage tiers, with all bits set to 0; when a specific data is determined to be “hot,” getting a specific array position by feeding the specific data to the hash function; and setting the specific array position of the m-bit Boom filter to 1. 6. The computer-implemented method as recited in claim 1 , wherein the data stored in the specified storage tier are split into the multiple data groups during a minor compaction or a major compaction. 7. The computer-implemented method as recited in claim 1 , wherein storage media in the specified storage tier include one or more of a RAM, a DRAM, a SRAM, a flash memory, a ROM, a PROM, an EPROM, an EEPROM, a FPGA, a hard disk, an optical disc, a magneto-optical disc, a floppy disk, a magnetic tape, a holographic storage medium, a solid-state drive and a secure digital card. 8. An apparatus, comprising: a computer system; and an application program instantiated on the computer system, wherein the application provides computer-generated output; wherein the computer system is configured to: generate multiple levels of data according to how recently the data have been updated, whereby most recently updated data are assigned to a newest level; store at least some of the levels of data in a specified storage tier of multiple storage tiers according to at least the latency of the storage tiers; split data stored in a specified storage level of the specified storage tier into multiple data groups according to access statistics of data, wherein a first data group of the data groups includes data whose access statistics satisfy a criterion and a second data group of the data groups includes data whose access statistics do not satisfy the criterion, the specified storage level including multiple data blocks, wherein each data block of a set of the data blocks includes at least some of the data from the first data group and at least some of the data from the second data group, the data groups ranked into the levels according to access frequencies, the computer system further configured to store data having the highest access frequency among the data groups to a particular data block in one of the levels that contains data with the highest access frequency; during compaction, extract, from each data block of the set of the data blocks, data that belongs to the first data group to generate a first data, and store the first data in one or more data blocks of the specified storage level, wherein a specified data block of the one or more data blocks includes a set of data records whose access statistics satisfy the criterion; receive a request for a specified data record of the set of data records; and read all of the set of data records into a low-latency storage tier of the storage tiers in response to the request for the specified record; and generate a log update specifying the first data extracted from the set of data blocks during compaction and appending the log update to a log file located in the specified storage level. 9. The apparatus as recited in claim 8 , wherein the specified storage tier is a low-latency storage tier, an intermediate-latency storage tier or a high-latency storage tier. 10. The apparatus as recited in claim 8 , wherein the computer system is configured to store separate data to a particular data block in one of the tiers if the separate data are more likely to be accessed together according to statistical analysis. 11. The apparatus as recited in claim 8 , wherein the computer system is configured to: assign an m-bit Bloom filter with k hash functions for at least a subset of the storage tiers, with all bits set to 0; when a new data is added in a given storage tier, get k array positions by feeding the new data to each of the k hash functions; and set the k array positions of the m-bit Boom filter to 1. 12. The apparatus as recited in claim 8 , wherein the computer system is configured to: assign an m-bit Bloom filter with one hash function for at least a subset of the storage tiers, with all bits set to 0; when a specific data is determined to be “hot,” getting a specific array position by feeding the specific data to the hash function; and setting the specific array position of the m-bit Boom filter to 1. 13. The apparatus as recited in claim 8 , wherein the data stored in the specified storage tier are split into the two or more data groups during a minor compaction or a major compaction. 14. The apparatus as recited in claim 8 , wherein storage media in the specified storage tier include one or more of a RAM, a DRAM, a SRAM, a flash me

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9846711B2 cover?
A variety of methods for improving efficiency in a database system are provided. In one embodiment, a method may comprise: generating multiple levels of data according to how recently the data have been updated, whereby most recently updated data are assigned to the newest level; storing each level of data in a specific storage tier; splitting data stored in a particular storage tier into two o…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 19 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).