What technology area does this patent fall under?

Primary CPC classification G06F16/24544. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jun 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic computation node grouping with cost based optimization for massively parallel processing

US2018165331A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2018165331-A1
Application number	US-201615374158-A
Country	US
Kind code	A1
Filing date	Dec 9, 2016
Priority date	Dec 9, 2016
Publication date	Jun 14, 2018
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A massively parallel processing shared nothing relational database management system includes a plurality of storages assigned to a plurality of compute nodes. The system comprises a non-transitory memory having instructions and one or more processors in communication with the memory. The one or more processors execute the instructions to store a set of data in a first set of storages in the plurality of storages. The first set of data is hashed into a repartitioned set of data. The first set of storages is reassigned to a second set of compute nodes in the plurality of compute nodes. The repartitioned set of data is distributed to the second set of compute nodes and a database operation is performed on the repartitioned set of data by the second set of compute nodes.

First claim

Opening claim text (preview).

What is claimed is: 1 . A massively parallel processing shared nothing relational database management system comprising: a plurality of storages assigned to a plurality of compute nodes; a non-transitory memory storing instructions; and one or more processors in communication with the non-transitory memory, wherein the one or more processors execute the instructions to: store a set of data in a first set of storages in the plurality of storage, the first set of storages assigned to a first set of compute nodes in the plurality of compute nodes; repartition the set of data by hashing into a repartitioned set of data; reassign the first set of storages to a second set of compute nodes in the plurality of compute nodes; distribute the repartitioned set of data to the second set of compute nodes; and perform a database operation on the repartitioned set of data by the second set of compute nodes. 2 . The system of claim 1 , wherein the repartition the set of data includes forming smaller hash buckets of the set of data by hashing. 3 . The system of claim 1 , wherein the repartition is omitted when a repartition key is the same key used to partition the set of data. 4 . The system of claim 1 , wherein the reassign includes form a network connections between the first set of storages and the second set of compute nodes, and wherein the distribute includes distribute the repartitioned set of data to the second set of compute nodes by way of the network connections. 5 . The system of claim 4 , wherein the first set of storages and the first set of compute nodes form a shared nothing node in the system, and wherein the database operation includes at least one of an inner join, scan and redistribute. 6 . The system of claim 5 , wherein the first set of storages include at least an integrated circuit memory to store the set of data, and wherein the first set of compute nodes include at least an integrated circuit processor coupled to the integrated circuit memory by a signal path to transfer the set of data. 7 . The system of claim 1 , further comprising the one or more processors executing the instructions to: obtain a plurality of logic plans that include the database operation on the set of data stored in the first set of storages; determining a cost of redistributing the set of data to at least another compute node for each logic plan in the plurality of logic plans; determining a cost reduction from inter-partition parallelism for each logic plan in the plurality of logic plans; and selecting a logic plan from the plurality of logic plans based on the cost of redistributing the set of data and the cost reduction from inter-partition parallelism. 8 . A computer-implemented method for accessing data, the method comprising: obtaining a plurality of logic plans to respond to a query; determining a cost of redistributing a set of data stored in a storage assigned to a compute node to at least another compute node for each logic plan in the plurality of logic plans; determining a cost reduction from inter-partition parallelism for each logic plan in the plurality of logic plans; and selecting a logic plan from the plurality of logic plans based on the cost of redistributing the set of data and the cost reduction from inter-partition parallelism. 9 . The computer-implemented method of claim 8 , wherein the logic plan includes at least one database operation on the set of data. 10 . The computer-implemented method of claim 9 , wherein the at least one database operation on the set of data includes at least one of a join, hash aggregation and redistribution. 11 . The computer-implemented method of claim 8 , wherein the determining the cost of redistributing the set of data comprises: calculating a number of tuples to be processed in the set of data; calculating a width of a tuple in the set of data; calculating a hashing cost factor for the set of data; calculating an average data transfer speed through a network coupled between the storage and at least the another compute node; calculating a degree of inter-partition parallelism with a skew factor; and calculating the cost of redistributing the set of data in response to at least the number of tuples to be processed, the width of the tuple, the hashing cost factor, the average data transfer speed and the degree of inter-partition parallelism with the skew factor. 12 . The computer-implemented method of claim 11 , wherein the skew factor represents a data skew associated with the set of data. 13 . The computer-implemented method of claim 8 , wherein the determining the cost reduction from inter-partition parallelism comprises: calculating an operator cost of a hash join on the data set; calculating an operator cost of a hash aggregate on the data set; calculating a hashing cost factor for the set of data; calculating a degree of inter-partition parallelism with a skew factor; and calculating the cost reduction from inter-parallelism in response to the operator cost of the hash join or the operator cost of the hash aggregate and the degree of inter-partition parallelism with the skew factor. 14 . The computer-implemented method of claim 13 , wherein the skew factor is computed based on a percentage of a most common value in the data set. 15 . The computer-implemented method of claim 8 , wherein the computer-implemented method is performed at least partially by a massively parallel processing shared nothing relational database management system. 16 . A non-transitory computer-readable medium storing computer instructions, that when executed by one or more processors, cause one or more processors to perform the steps of: store a set of data in a first set of storages in a plurality of storages, the first set of storages assigned to a first set of compute nodes in a plurality of compute nodes; obtain a plurality of logic plans to respond a query that accesses the set of data; determine a cost of redistributing the set of data stored in the first set of storages to a second set of compute nodes for each logic plan in the plurality of logic plans; determine a cost reduction from inter-partition parallelism for each logic plan in the plurality of logic plans; select a logic plan from the plurality of logic plans based on the cost of redistributing the set of data and the cost reduction from inter-partition parallelism; repartition the set of data by hashing into a repartitioned set of data; reassign the first set of storages to the second set of compute nodes; distribute the repartitioned set of data to the second set of compute nodes; and perform a database operation on the repartitioned set of data by the second set of compute nodes to provide an answer to the query. 17 . The non-transitory computer-readable medium of claim 16 , wherein the plurality of storages and plurality of compute nodes are included in a massively parallel processing shared nothing relational database management system. 18 . The non-transitory computer-readable medium of claim 16 , wherein determine the cost of redistributing the set of data comprises: calculate a number of tuples to be processed in the set of data; calculate a width of a tuple in the set of data; calculate a hashing cost factor for the set of data; calculate an average data transfer speed through a network coupled between at the first set of storages and the second set of compute nodes; calculate a degree of inter-partition parallelism with a skew factor; and calculate the cost of redistribut

Assignees

Futurewei Technologies Inc

Inventors

Classifications

G06F16/24545
Selectivity estimation or determination · CPC title
G06F16/24532
of parallel queries · CPC title
G06F16/24544Primary
Join order optimisation · CPC title
G06F16/24554
Unary operations; Data partitioning operations · CPC title
G06F16/2255
Hash tables · CPC title

Patent family

Related publications grouped by family.

View patent family 62489142

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018165331A1 cover?: A massively parallel processing shared nothing relational database management system includes a plurality of storages assigned to a plurality of compute nodes. The system comprises a non-transitory memory having instructions and one or more processors in communication with the memory. The one or more processors execute the instructions to store a set of data in a first set of storages in the pl…
Who is the assignee on this patent?: Futurewei Technologies Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/24544. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jun 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Partitioned join with dense inner table representation

System and method for distributed SQL join processing in shared-nothing relational database clusters using self directed data streams

System and method for distributed SQL join processing in shared-nothing relational database clusters using stationary tables

Frequently asked questions