System and method for optimizing data migration in a partitioned database

US9740762B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9740762-B2
Application numberUS-201113078104-A
CountryUS
Kind codeB2
Filing dateApr 1, 2011
Priority dateApr 1, 2011
Publication dateAug 22, 2017
Grant dateAug 22, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one aspect, provided is a horizontally scaled database architecture. Partition a database enables efficient distribution of data across a number of systems reducing processing costs associated with multiple machines. According to some aspects, the partitioned database can be manages as a single source interface to handle client requests. Further, it is realized that by identifying and testing key properties, horizontal scaling architectures can be implemented and operated with minimal overhead. In one embodiment, databases can be partitioned in an order preserving manner such that the overhead associated with moving the data for a given partition can be minimized during management of the data and/or database. In one embodiment, splits and migrations operations prioritize zero cost partitions, thereby, reducing computational burden associated with managing a partitioned database.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for optimizing data distribution, the system comprising: at least one processor operatively connected to a memory for executing system components; a database comprising a plurality of database partitions, wherein at least one of the plurality of database partitions includes a contiguous range of data from the database and wherein new data added to the at least one of the plurality of database partitions is assigned a key value or key pattern greater than previously used key values or patterns; and a partition component configured to: detect a partition size for the at least one of the plurality of database partitions that exceeds a size threshold; split, automatically, the at least one of the plurality of database partitions into at least a first and a second partition; control, during splitting of an existing one of the at least one of the plurality of database partitions having the contiguous range of data into the at least the first and the second database partition, a distribution of data originating from the existing one of the at least one of the plurality of database partitions to the first and the second partition based on a value for a database key associated with the data in the at least one of the plurality of database partitions, wherein the partition component is further configured to: minimize any data distributed to the second partition originating from the at least one of the plurality of database partitions during splitting of the existing one of the at least one of the plurality of database partitions; and maximize any data distributed to the first partition from the data originating from the existing one of the at least one of the plurality of database partitions to the first partition up to a maximum size for the first partition during splitting of the existing one of the at least one of the plurality of database partitions. 2. The system according to claim 1 , wherein the partition component is further configured to: assign at least any data in the at least one of the plurality of database partitions having associated database key values less than the maximum value to the first partition; and assign at least any data in the at least one of the plurality of database partitions having database key values greater than the maximum value to the second partition. 3. The system according to claim 1 , wherein the partition component is further configured to: qualify the at least one of the plurality of database partitions for splitting by identifying a sequential database key organizing the at least one of the plurality of database partitions. 4. The system according to claim 1 , wherein the system further comprises a plurality of servers, wherein the plurality of servers are configured to host the plurality of database partitions. 5. The system according to claim 4 , further comprising a routing component configured to route database requests to identified partitions, wherein the routing component is further configured to identify partitions based, at least, on key values associated with the data request. 6. The system according to claim 5 , further comprising a rebalancing component configured to determine a state of the database based on a distribution of the plurality of partitions across the plurality of servers, wherein the rebalancing component is further configured to migrate at least one partition in response to the state indicating an imbalanced distribution of partitions. 7. The system according to claim 5 , wherein the configuration component is further configured to replicate the metadata across any routing component of the system. 8. The system according to claim 4 , further comprising a configuration component configured to manage metadata information associated with each of the plurality of partitions, the metadata information including a defined range of key values associated with each partition. 9. The system according to claim 8 , wherein the configuration component is further configured to update the metadata information in response to the partition component splitting the at least one of the plurality of database partitions into at least the first and the second partition. 10. The system according to claim 8 , wherein the configuration component is further configured to update the metadata information in response to migration of database partitions between the plurality of servers. 11. The system according to claim 4 , further comprising a reconciliation component configured to log database operations received on partitions during at least one of a migration operation and a splitting operation, wherein the reconciliation component is further configured to update at least one partition in response to the completion of a respective migration and splitting operation. 12. The system according to claim 1 , further comprising a migration component configured to migrate database partitions between a plurality of servers configured to host the database partitions. 13. The system according to claim 1 , wherein the partition component is configured to assign at least data keys associated with new data to the second partition. 14. The system according to claim 13 , wherein the partition component is configured to assign any data from the at least one of the plurality of database partitions not assigned to the first partition to the second partition. 15. The system according to claim 1 , wherein the partition component is configured to determine the maximum size for the first partition based on the size threshold and a growth threshold. 16. The system according to claim 1 , further comprising a routing component configured to route database requests to identified partitions, wherein the routing component is configured to direct new data records that would have been directed to the at least one partition prior to the split operation, to the second partition. 17. A computer implemented method for optimizing data distribution, the method comprising acts of: monitoring, by a computer system, a distributed database including a plurality of database partitions, wherein at least one of the plurality of database partitions includes a contiguous range of data from the database and wherein new data added to the at least one of the plurality of database partitions is assigned a key value or key pattern greater than previously used key values or patterns; detecting, by the computer system, a partition size of the at least one of the plurality of database partitions exceeds a size threshold; splitting, by the computer system, the at least one of the plurality of database partitions into at least a first and a second partition; controlling, by the computer system during the act of splitting, a distribution of data originating from an existing one of the at least one of the plurality of database partitions to the first and the second partition based on a value for a database key associated with the data in the at least one of the plurality of database partitions, wherein controlling the distribution includes: minimizing any data distributed to the second partition originating from the existing one of the at least one of the plurality of database partitions during the act of splitting; and maximizing assignment of the data originating from the existing one of the at least one of the plurality of partitions to the first partition up to a maximum size for the first partition during the act of splitting. 18. The method according to claim 17 , wherein the act of minimizing any data distributed to the second pa

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • G06F16/278Primary

    Data partitioning, e.g. horizontal or vertical partitioning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9740762B2 cover?
According to one aspect, provided is a horizontally scaled database architecture. Partition a database enables efficient distribution of data across a number of systems reducing processing costs associated with multiple machines. According to some aspects, the partitioned database can be manages as a single source interface to handle client requests. Further, it is realized that by identifying …
Who is the assignee on this patent?
Horowitz Eliot, Merriman Dwight, Mongodb Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30584. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).