Method for failure-resilient data placement in a distributed query processing system

US9842148B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9842148-B2
Application numberUS-201514704825-A
CountryUS
Kind codeB2
Filing dateMay 5, 2015
Priority dateMay 5, 2015
Publication dateDec 12, 2017
Grant dateDec 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Herein is described a data placement scheme for a distributed query processing systems that achieves load balance amongst the nodes of the system. To identify a node on which to place particular data, a supervisor node performs a placement algorithm over the particular data's identifier, where the placement algorithm utilizes two or more hash functions. The supervisor node runs the placement algorithm until a destination node is identified that is available to store the data, or the supervisor node has run the placement algorithm an established number of times. If no available node is identified using the placement algorithm, then an available destination node is identified for the particular data and information identifying the data and the selected destination node is included in an exception map. Most data may be located by any node in the system based on the node performing the placement algorithm for the required data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized distributed query processing system comprising: a plurality of computing devices, each computing device being configured with a data store; and a supervisor computing device communicatively connected to the plurality of computing devices; wherein the supervisor computing device is configured to: identify a particular computing device of the plurality of computing devices as a destination computing device of a particular unit of data; wherein the particular unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a particular data identifier; to identify said particular computing device, said supervisor computing device is configured to: perform a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier, wherein said supervisor computing device being configured to perform the placement function comprises said supervisor computing device being configured to: combine results of the two or more hash functions to produce combined results, and identify the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and to cause the particular unit of data to be stored on the data store of the particular computing device as the destination computing device of the particular unit of data. 2. The computerized distributed query processing system of claim 1 , wherein the supervisor computing device is further configured to: prior to performance of the placement function, determine whether the placement function has been run, to identify the destination computing device for the particular unit of data, less than a particular number of times; and in response to a determination that the placement function has been run, to identify the destination computing device for the particular unit of data, less than the particular number of times, perform the placement function on the particular data identifier. 3. The computerized distributed query processing system of claim 1 , wherein the supervisor computing device is further configured to: identify a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data; wherein the second unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a second data identifier; to identify said certain computing device, said supervisor computing device is configured to: determine whether the placement function has been run, to identify the destination computing device for the second unit of data, less than or equal to a particular number of times; and in response to a determination that the placement function has been run, to identify the destination computing device for the second unit of data, the particular number of times: identify the certain computing device to be the destination computing device of the second unit of data based, at least in part, on information that indicates that the certain computing device is available to store the second unit of data, and include information mapping the second unit of data to the certain computing device in an exception map stored at the supervisor computing device; and based on identification of the certain computing device, cause the second unit of data to be stored on the certain computing device. 4. The computerized distributed query processing system of claim 1 , wherein the supervisor computing device is further configured to: identify a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data; wherein the second unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a second data identifier; to identify said certain computing device, said supervisor computing device is configured to: perform a first performance of the placement function that is based, at least in part, on the second data identifier, wherein the first performance of the placement function comprises combining results of the two or more hash functions to produce second combined results, identify a first unavailable computing device based on the second combined results, determine that the first unavailable computing device is not available to store the second unit of data, in response to a determination that the first unavailable computing device is not available to store the second unit of data, perform a second performance of the placement function that is based, at least in part, on the second data identifier, wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce third combined results, wherein the third combined results are different than the second combined results, identify the certain computing device based on the third combined results, and determine that the certain computing device is available to store the second unit of data; and in response to a determination that the certain computing device is available to store the second unit of data, cause the second unit of data to be stored on the certain computing device. 5. The computerized distributed query processing system of claim 1 , wherein the supervisor computing device is further configured to retrieve a certain unit of data from a certain computing device of the plurality of computing devices by being configured to: retrieve information mapping an identifier of the certain unit of data to the certain computing device in an exception map stored at the supervisor computing device; and retrieve the certain unit of data from the certain computing device based on the retrieved information mapping the identifier of the certain unit of data to the certain computing device. 6. The computerized distributed query processing system of claim 1 , wherein the supervisor computing device is further configured to retrieve a certain unit of data from a certain computing device of the plurality of computing devices by being configured to: determine that an exception map stored at the supervisor computing device does not include information mapping a certain data identifier that identifies the certain unit of data to any of the plurality of computing devices; in response to a determination that the exception map does not include information mapping the certain data identifier to any of the plurality of computing devices: perform a second performance of the placement function based, at least in part, on the certain data identifier, wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce second combined results, and identify the certain computing device to be the destination computing device of the certain unit of data based on the second combined results; and based on an identification of the certain computing device to be the destination computing device of the certain unit of data, retrieving the certain unit of data from the certain computing device. 7. The computerized distributed query processing system of claim 1 , wherein one or more of the plurality of computing devices are independently configured to retrieve a certain unit of data from a certain computing device of the plurality of computing devices by being configured to: perform a second performance of the placement function based, at least in part, on a certain data identifier that identifies the certain unit of data; wherein the second performance of the placement function comprises combining results of the two or more hash functi

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9842148B2 cover?
Herein is described a data placement scheme for a distributed query processing systems that achieves load balance amongst the nodes of the system. To identify a node on which to place particular data, a supervisor node performs a placement algorithm over the particular data's identifier, where the placement algorithm utilizes two or more hash functions. The supervisor node runs the placement al…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/2471. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).