Per-node custom code engine for distributed query processing

US11487771B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11487771-B2
Application numberUS-201615371245-A
CountryUS
Kind codeB2
Filing dateDec 7, 2016
Priority dateJun 25, 2014
Publication dateNov 1, 2022
Grant dateNov 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Distributed query processing is often performed by a set of nodes that apply MapReduce to a data set and materialize partial results to storage, which are then aggregated to produce the query result. However, this architecture requires a preconfigured set of database nodes; can only fulfill queries that utilize MapReduce processing; and may be slowed down by materializing partial results to storage. Instead, distributed query processing can be achieved by choosing a node for various portions of the query, and generating customized code for the node that only performs the query portion that is allocated to the node. The node executes the code to perform the query portion, and rather than materializing partial results to storage, streams intermediate query results to a next selected node in the distributed query. Nodes selection may be involve matching the details of the query portion with the characteristics and capabilities of the available nodes.

First claim

Opening claim text (preview).

What is claimed is: 1. A server that executes a query, the server comprising: a processor; and a memory storing instructions, wherein execution of the instructions by the processor causes the server to: partition the query into at least two query portions; for a selected query portion: choose, from a node set including two or more different nodes, a selected node to perform the selected query portion based at least on a query type of the selected query portion and the selected node having a different allocation of physical hardware resources from one or more unselected nodes of the node set, the different allocation of physical hardware resources of the selected node including a type of processing resource not included in the one or more unselected nodes of the node set, the type of processing resource capable of performing the selected query portion; generate a query instruction set for the selected node, wherein the query instruction set is executable to cause the selected node to: perform the selected query portion, and when the selected query portion generates an intermediate query result, transmit the intermediate query result to a next selected node of the node set; and deploy the query instruction set to the selected node. 2. The server of claim 1 , wherein: the selected query portion involves query processing of a selected query processing type; and wherein the instructions executable to cause the server to choose the selected node for the selected query portion further comprise instructions executable to: evaluate respective nodes of the node set to identify candidate nodes that are capable of performing query processing of the selected query processing type; and among the candidate nodes, choose the selected node for the selected query portion. 3. The server of claim 1 , wherein: the selected query portion involves a resource; and wherein the instructions executable to cause the server to choose the selected node for the selected query portion further comprise instructions executable to: evaluate respective nodes of the node set to identify candidate nodes that have access to the resource involved in the selected query portion; and among the candidate nodes, choose the selected node for the selected query portion. 4. The server of claim 1 , wherein: the selected query portion involves proprietary processing; the node set further comprises: at least one trusted node that is trusted to perform the proprietary processing, and at least one untrusted node that is not trusted to perform the proprietary processing; and the instructions executable to cause the server to choose the selected node for the selected query portion further comprise instructions executable to: choose the selected node only from among the at least one trusted node of the node set. 5. The server of claim 1 , wherein the instructions executable to cause the server to choose the selected node for the selected query portion further comprise instructions executable to: estimate processing costs of utilizing respective nodes of the node set to perform the selected query portion; and choose the selected node for the selected query portion according to the processing costs of the respective nodes. 6. The server of claim 5 , wherein: the selected query portion involves a data set that is not stored by a first node of the node set; and wherein the instructions executable to cause the server to estimate the processing cost for the first node of the node set further comprise instructions executable to: estimate the cost of delivering the data set to the first node of the node set. 7. The server of claim 1 , wherein the instructions executable to cause the server to choose the selected node for the selected query portion further comprise instructions executable to: estimate a processing performance of the selected node for the selected query portion; and wherein the instructions executable to choose the selected node for the selected query portion further comprise instructions executable to: choose the selected node according to the processing performance of the selected node. 8. A client device that participates in a query executed by a node set, the client comprising: a processor; and a memory storing instructions executable by the processor to cause the client device to: receive a query instruction set, generated for the client device, that expresses a query portion of the query and that specifies a next selected node of the node set, the query instruction set being received based at least upon a query type of the query portion and the client device having a different allocation of physical hardware resources from one or more unselected nodes of the node set, the different allocation of physical hardware resources of the client device including a type of processing resource not included in the one or more unselected nodes of the node set, the type of processing resource capable of performing the query portion; execute the query instruction set for the query portion to produce an intermediate query result; and transmit the intermediate query result to the next selected node of the node set. 9. The client device of claim 8 , wherein: the instructions executable to cause the client device to execute the query instruction set further comprise instructions executable to cause the client device to: receive, from a previous selected node of the node set, a first intermediate result produced by performing a previous query portion of the query; and execute the query instruction set over the first intermediate result to produce a second intermediate query result; and wherein the instructions executable to cause the client device to transmit the intermediate query result further comprise instructions executable to cause the client device to: transmit the second intermediate query result to the next selected node of the node set. 10. The client device of claim 8 , wherein the instructions are further executable to: cause the client device to process the query instruction set to generate a first intermediate query result portion and a second intermediate query result portion; cause the client device to initiate transmitting of the first intermediate query result portion to the next selected node before completing processing and generation of the second intermediate query result portion. 11. The client device of claim 8 , wherein: the client device is in direct communication with the next selected node; and the instructions are further executable to transmit the intermediate query result directly to the next selected node of the node set. 12. The client device of claim 8 , wherein: the instructions are further executable to cause the client device to receive a node map that identifies the nodes of the node set that process one or more query portions of the query; and the instructions are further executable to identify the next selected node according to the node map. 13. The client device of claim 8 , wherein: the query instruction set further comprises transmit instructions executable to transmit the intermediate query result to the next selected node of the node set. 14. A method of executing a query using a node set comprising a plurality of nodes, the method performed by a server and comprising: partitioning the query into at least two query portions; for a selected query portion: choosing, from the node set, a selected node to perform the selected query portion, wherein the selected node is chosen based at least on a query type of the selected query portion and the selected node having a di

Assignees

Inventors

Classifications

  • Iterative querying; Query formulation based on the results of a preceding query · CPC title

  • Internal representations for queries · CPC title

  • of parallel queries · CPC title

  • Distributed queries · CPC title

  • Data stream processing; Continuous queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11487771B2 cover?
Distributed query processing is often performed by a set of nodes that apply MapReduce to a data set and materialize partial results to storage, which are then aggregated to produce the query result. However, this architecture requires a preconfigured set of database nodes; can only fulfill queries that utilize MapReduce processing; and may be slowed down by materializing partial results to sto…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2471. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).