Unified data management for database systems

US9892150B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9892150-B2
Application numberUS-201514816805-A
CountryUS
Kind codeB2
Filing dateAug 3, 2015
Priority dateAug 3, 2015
Publication dateFeb 13, 2018
Grant dateFeb 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A database architecture includes at least an in-memory database and a disk-based database (also referred to as “hot” and “warm” data stores). In the database architecture, data can be partitioned (and re-partitioned) and/or moved within and among the in-memory and disk-based databases, based on query access patterns derived from received database queries. The partitions and inter-database movements can be based at least in part on clustered, dynamic data units that are defined using shared individual attribute values of data records, and updated based on the received queries.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product, the computer program product being tangibly embodied on a non-transitory computer-readable storage medium and comprising instructions that, when executed, are configured to cause at least one computing device to: receive a stream of queries to be applied against data in a columnar-store database in which hot data is stored in an in-memory database for preferred access relative to warm data stored in a disk-based database; track usage counts for columns of the columnar-store database, reflecting usage thereof by individual queries of the stream of queries; select column sets, each column set including at least one column of the columnar-store database; calculate, for each column set, at least one record count of data records that include a column value, and at least one distinct count of distinct column values within each column set; generate at least one dynamic data unit from the column sets, based on the usage counts, the at least one record count, and the at least one distinct count, the at least one dynamic data unit including a set of data records sharing at least one common column value; analyze partitions of the data, based on the at least one dynamic data unit, to thereby obtain updated partitions; identify updated hot data and updated warm data within the updated partitions, based on the at least one dynamic data unit; and execute a data swap of data units from the partitions to the updated partitions, using the at least one dynamic data unit, to thereby have the updated hot data positioned within the in-memory database and the updated warm data within the disk-based database. 2. The computer program product of claim 1 , wherein the instructions, when executed, are further configured to cause the at least one computing device to: process a current query of the stream of queries, including loading related columns from a disk memory of the in-memory database to a main memory of the in-memory database, and evicting under-used columns from the main memory of the in-memory database to the disk memory of the in-memory database, based on a fullness level of the main memory of the in-memory database, wherein the loading and the evicting are performed using the at least one dynamic data unit. 3. The computer program product of claim 1 , wherein the at least one dynamic data unit is represented as a pair of vectors, including a first vector that is a Boolean vector [b 0 , b 1 , . . . , b |C| ], and a second vector of column values [υ 0 , υ 1 , . . . , υ |C|] , wherein C is the column sets, |C| is the cardinality of C, b i are indicator functions indicating which columns are used, and v i are values of the columns used. 4. The computer program product of claim 1 , wherein the generation of the at least one dynamic data unit is conducted in response to one or both of a workload count received from a workload counter indicating that a workload threshold of queries has been met, and a detection of a data insert or update of the data. 5. The computer program product of claim 1 , wherein the instructions, when executed, are further configured to cause the at least one processor to monitor a window of queries of the stream of queries, thereby defining a sliding query window of fixed checkpoint size, and to slide the query window upon detection of a reaching of a maximum size by the query window. 6. The computer program product of claim 5 , wherein the partition analysis is conducted in response to a detection that the query window maximum size has been reached. 7. The computer program product of claim 1 , wherein the partition analysis is conducted in response to a detection that a performance of the in-memory database in processing a current query has degraded to a threshold. 8. The computer program product of claim 1 , wherein the at least one dynamic data unit includes a plurality of dynamic data units, and wherein the instructions, when executed, are further configured to generate a cluster of a set of the plurality of dynamic data units, for movement of the cluster between the in-memory database and the disk-based database, or between a partition and updated partition. 9. The computer program produce of claim 8 , wherein the cluster is defined using an affinity matrix linking query usage of pairs of the at least one dynamic data unit, based on co-occurrence of each pair within a query window of the stream of queries. 10. The computer program product of claim 1 , wherein the data swap is executed by finding partitions no longer present in the updated partitions, and providing a least cost model for swapping data units to minimize shuffling of data units during the data swap. 11. A computer-implemented method for executing instructions stored on a non-transitory computer readable storage medium, the method comprising: receiving a stream of queries to be applied against data in a columnar-store database in which hot data is stored in an in-memory database for preferred access relative to warm data stored in a disk-based database; identifying a dynamic data unit generation trigger from the stream of queries; identifying sets of columns of the columnar-store database; identifying, for each column set of the sets of columns, at least one record count of data records including a column value, and at least one distinct count of distinct column values; calculating a relative probability distribution distance for each column set of the sets of columns, based on the at least one record count and the at least one distinct count; generating at least one dynamic data unit from at least one column set of the sets of columns, based on the at least one record count, the at least one distinct count, and the relative probability distribution distances; and partitioning the data, and identifying updated hot data and updated warm data therein, based on the at least one dynamic data unit. 12. The method of claim 11 , wherein the identifying sets of columns includes pruning available column sets based on one or more of: usage counts of columns used to satisfy the queries, the at least one record count, and the at least one distinct count. 13. The method of claim 11 , wherein the dynamic data unit trigger includes one or more of: a workload count received from a workload counter indicating that a workload threshold of queries has been met, and a detection of a data insert or update of the data. 14. The method of claim 11 , comprising: monitoring a window of queries of the stream of queries, thereby defining a sliding query window of fixed checkpoint size; and sliding the query window upon detection of a reaching of a maximum size by the query window. 15. The method of claim 14 , wherein the partitioning is initiated in response to one or more of: a detection that the query window maximum size has been reached, and a detection that a performance of the in-memory database in processing a current query has degraded to a threshold. 16. A system including instructions recorded on a non-transitory computer-readable storage medium, and executable by at least one processor, the system comprising: a query window monitor configured to cause the at least one processor to count queries of a stream of queries to be applied against data in a columnar-store database in which hot data is stored in an in-memory database for preferred access relative to warm data stored in a disk-based database, wherein the in-memory database includes a main memory and a disk-based memory configured to store the hot data, and wherein the query window monitor is further configured to count the que

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9892150B2 cover?
A database architecture includes at least an in-memory database and a disk-based database (also referred to as “hot” and “warm” data stores). In the database architecture, data can be partitioned (and re-partitioned) and/or moved within and among the in-memory and disk-based databases, based on query access patterns derived from received database queries. The partitions and inter-database movem…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/24568. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).