What technology area does this patent fall under?

Primary CPC classification G06F16/278. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Adaptive data repartitioning and adaptive data replication

US10223437B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10223437-B2
Application number	US-201514634199-A
Country	US
Kind code	B2
Filing date	Feb 27, 2015
Priority date	Feb 27, 2015
Publication date	Mar 5, 2019
Grant date	Mar 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and apparatus for adaptive data repartitioning and adaptive data replication is provided. A data set stored in a distributed data processing system is partitioned by a first partitioning key. A live workload comprising a plurality of data processing commands is processed. While processing the live workload, statistical properties of the live workload are maintained. Based on the statistical properties of the live workload with respect to the data set, it is determined to replicate and/or repartition the data set by a second partitioning key. The replicated and/or repartitioned data set is partitioned by the second partitioning key.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: partitioning a data set stored in a distributed data processing system by a first partitioning key; processing a live workload comprising a plurality of data processing commands; while processing the live workload, maintaining statistical properties of the live workload and receiving a query; and while executing the query as part of the live workload: based on the statistical properties of the live workload with respect to the data set, determining to repartition the data set by a second partitioning key that is selected based on a percentage of the data set to transfer to repartition the data set by the second partitioning key; and repartitioning the data set in the distributed data processing system by the second partitioning key; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , further comprising: partitioning a second data set stored in the distributed data processing system by a third partitioning key; based on the statistical properties of the live workload with respect to the second data set, determining to replicate the second data set in the distributed data processing system based on a fourth partitioning key; storing an additional copy of the second data set in the distributed data processing system, wherein the additional copy of the second data set is partitioned by the fourth partitioning key. 3. The method of claim 1 , wherein the first partitioning key is selected based on initial workload statistic values. 4. The method of claim 1 , wherein the first partitioning key is selected based on statistical properties of a sample workload. 5. The method of claim 1 , wherein determining to repartition the data set by a second partitioning key is further based on an association strength between the first partitioning key and the second partitioning key. 6. The method of claim 1 , wherein the statistical properties comprise a selectivity metric for one or more data processing commands in the live workload, wherein the selectivity metric is based on an average amount of the data set required by the one or more data processing commands. 7. The method of claim 1 , wherein the statistical properties comprise a projection metric for one or more data processing commands in the live workload, wherein the projection metric is based on an average amount of the data set required by the one or more data processing commands. 8. The method of claim 1 , wherein the statistical properties comprise a key frequency metric for a particular partitioning key of the data set, wherein the key frequency metric is based on a frequency of access of the data set by the particular partitioning key in the live workload. 9. The method of claim 1 , wherein the statistical properties comprise a table frequency metric for the data set, wherein the table frequency metric is based on a frequency of access of the data set in the live workload. 10. A method comprising: partitioning a data set stored in a distributed data processing system by a first partitioning key; processing a live workload comprising a plurality of data processing commands; while processing the live workload, maintaining statistical properties of the live workload and receiving a query; and while executing the query as part of the live workload: based on the statistical properties of the live workload with respect to the data set, determining to replicate the data set in the distributed data processing system based on a second partitioning key that is selected based on a percentage of the data set to transfer to repartition the data set by the second partitioning key; and storing an additional copy of the data set in the distributed data processing system, wherein the additional copy of the data set is partitioned by the second partitioning key; wherein the method is performed by one or more computing devices. 11. The method of claim 10 , wherein said determining to replicate the data set is further based on an available amount of memory. 12. A non-transitory computer-readable medium storing instructions which, when executed by one or more processors, cause: partitioning a data set stored in a distributed data processing system by a first partitioning key; processing a live workload comprising a plurality of data processing commands; while processing the live workload, maintaining statistical properties of the live workload and receiving a query; and while executing the query as part of the live workload: based on a query plan of the query and the statistical properties of the live workload with respect to the data set, determining to repartition the data set by a second partitioning key that is selected based on a percentage of the data set to transfer to repartition the data set by the second partitioning key; and repartitioning the data set in the distributed data processing system by the second partitioning key. 13. The non-transitory computer-readable medium of claim 12 , wherein the instructions further cause: partitioning a second data set stored in the distributed data processing system by a third partitioning key; based on the statistical properties of the live workload with respect to the second data set, determining to replicate the second data set in the distributed data processing system based on a fourth partitioning key; storing an additional copy of the second data set in the distributed data processing system, wherein the additional copy of the second data set is partitioned by the fourth partitioning key. 14. The non-transitory computer-readable medium of claim 12 , wherein the first partitioning key is selected based on initial workload statistic values. 15. The non-transitory computer-readable medium of claim 12 , wherein the first partitioning key is selected based on statistical properties of a sample workload. 16. The non-transitory computer-readable medium of claim 12 , wherein determining to repartition the data set by a second partitioning key is further based on an association strength between the first partitioning key and the second partitioning key. 17. The non-transitory computer-readable medium of claim 12 , wherein the statistical properties comprise a selectivity metric for one or more data processing commands in the live workload, wherein the selectivity metric is based on an average amount of the data set required by the one or more data processing commands. 18. The non-transitory computer-readable medium of claim 12 , wherein the statistical properties comprise a projection metric for one or more data processing commands in the live workload, wherein the projection metric is based on an average amount of the data set required by the one or more data processing commands. 19. The non-transitory computer-readable medium of claim 12 , wherein the statistical properties comprise a key frequency metric for a particular partitioning key of the data set, wherein the key frequency metric is based on a frequency of access of the data set by the particular partitioning key in the live workload. 20. The non-transitory computer-readable medium of claim 12 , wherein the statistical properties comprise a table frequency metric for the data set, wherein the table frequency metric is based on a frequency of access of the data set in the live workload. 21. A non-transitory computer-readable medium storing instructions which, when executed by one or more processors, cause: partitioning a data set stored in a distributed

Assignees

Oracle Int Corp

Inventors

Classifications

G06F16/278Primary
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
G06F17/30584Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 56799138

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10223437B2 cover?: A method and apparatus for adaptive data repartitioning and adaptive data replication is provided. A data set stored in a distributed data processing system is partitioned by a first partitioning key. A live workload comprising a plurality of data processing commands is processed. While processing the live workload, statistical properties of the live workload are maintained. Based on the statis…
Who is the assignee on this patent?: Oracle Int Corp
What technology area does this patent fall under?: Primary CPC classification G06F16/278. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Database partitioning scheme evaluation and comparison

Database system providing skew metrics across a key space

Database partitioning for data processing system

Frequently asked questions