High availability and automated recovery in scale-out distributed database system

US11455219B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11455219-B2
Application numberUS-202017077028-A
CountryUS
Kind codeB2
Filing dateOct 22, 2020
Priority dateOct 22, 2020
Publication dateSep 27, 2022
Grant dateSep 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Herein are acceleration techniques for resuming offloaded execution by replacing a failed computer with a hot spare computer. In an embodiment, a distributed system configures a DBMS, a set of participating computers, and a set of spare computers. The DBMS receives a query of a database. From the query, an offload query plan is generated for distributed execution. The DBMS sends the offload query plan and a respective portion of the database to each participating computer. The distributed system detects that a participating computer failed after the offload query plan was sent. Responsively, the DBMS sends the same offload query plan and same respective portion of the database of the failed computer to a replacement computer from the spare computers. Despite the computer failure, the DBMS receives results of successful distributed execution of the offload query plan that include a result from the replacement computer.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: configuring a plurality of spare computers that does not contain a plurality of participating computers; receiving, by a database management system (DBMS), a query of a database; generating, from the query of the database, an offload query plan; sending, by the DBMS, said offload query plan and a respective portion of the database to each computer of the plurality of participating computers; detecting that a failed computer of the plurality of participating computers failed after said sending said offload query plan; in response to said detecting that the failed computer failed: reassigning a logical identifier of the failed computer to a replacement computer of the plurality of spare computers, and sending said offload query plan and said respective portion of the database of said failed computer to the replacement computer; receiving, by the DBMS, results of successful distributed execution of said offload query plan that include a result from the replacement computer. 2. The method of claim 1 further comprising: in response to said detecting that the failed computer failed, reassigning the replacement computer from the plurality of spare computers to the plurality of participating computers; receiving a second query of the database after said sending said offload query plan and before said reassigning the replacement computer; executing, based on said detecting that the failed computer failed, the second query by waiting until after said reassigning the replacement computer to execute the second query with the plurality of participating computers. 3. The method of claim 1 wherein: said offload query plan contains said logical identifier of the failed computer; the method further comprises one or more computers of the plurality of participating computers communicating, based on said logical identifier in said offload query plan, with the replacement computer. 4. The method of claim 3 wherein said communicating with the replacement computer is in response to the plurality of participating computers, excluding the failed computer, restarting execution of said offload query plan. 5. The method of claim 4 wherein said restarting execution of said offload query plan does not comprise resending, by the DBMS, said offload query plan. 6. The method of claim 1 wherein said sending said offload query plan to the replacement computer comprises waiting until after an adjustment of an epoch variable selected from the group consisting of: a counter, and a timestamp. 7. The method of claim 1 wherein: said sending said offload query plan comprises sending a particular value of an epoch variable; the method further comprises at least one computer of the plurality of participating computers detecting that said particular value of the epoch variable is older than a current value of the epoch variable. 8. The method of claim 1 further comprising reassigning the failed computer from the plurality of participating computers to the plurality of spare computers. 9. The method of claim 1 wherein the plurality of participating computers and the replacement computer do not store said respective portions of the database in nonvolatile storage. 10. The method of claim 1 further comprising populating the plurality of spare computers with similar respective amounts of computers that are directly connected to a respective network switch of a plurality of network switches. 11. The method of claim 1 further comprising at least one spare computer of the plurality of spare computers, without receiving said offload query plan: receiving an intermediate result from at least one computer of the plurality of participating computers, and sending the intermediate result to at least one computer of the plurality of participating computers for further processing. 12. The method of claim 1 wherein the plurality of participating computers contains at least two thousand computers. 13. The method of claim 1 further comprising: in response to said detecting that the failed computer failed, reassigning the replacement computer from the plurality of spare computers to the plurality of participating computers; receiving a second query of the database after said sending said offload query plan and before said reassigning the replacement computer; executing, based on said detecting that the failed computer failed, the second query without the plurality of participating computers. 14. A method comprising: configuring a plurality of spare computers that does not contain a plurality of participating computers; receiving, by a database management system (DBMS), a query of a database; generating, from said query of the database, a first offload query plan that contains: a first logical identifier of a first failed computer of the plurality of participating computers, and a second logical identifier of a second failed computer of the plurality of participating computers; first sending, by the DBMS, said first offload query plan and a respective first portion of the database to each computer of the plurality of participating computers; first detecting that said first failed computer failed after said first sending said first offload query plan; second sending, in response to said first detecting that the first failed computer failed, said first offload query plan and said respective first portion of the database of said first failed computer to a replacement computer of the plurality of spare computers; second detecting that said second failed computer failed after said second sending said first offload query plan; in response to said second detecting that the second failed computer failed: a) detecting said plurality of spare computers is empty; b) regenerating, from said query, a second offload query plan that does not contain said second logical identifier of said second failed computer; c) repartitioning the database into a plurality of second portions; and d) third sending said second offload query plan and a respective second portion of the plurality of second portions to each computer of the plurality of participating computers that has not failed; receiving, by the DBMS, results of successful distributed execution of said second offload query plan that include a result from the replacement computer. 15. One or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause: configuring a plurality of spare computers that does not contain a plurality of participating computers; receiving, by a database management system (DBMS), a query of a database; generating, from the query of the database, an offload query plan; sending, by the DBMS, said offload query plan and a respective portion of the database to each computer of the plurality of participating computers; detecting that a failed computer of the plurality of participating computers failed after said sending said offload query plan; in response to said detecting that the failed computer failed: reassigning a logical identifier of the failed computer to a replacement computer of the plurality of spare computers, and sending said offload query plan and said respective portion of the database of said failed computer to the replacement computer; receiving, by the DBMS, results of successful distributed execution of said offload query plan that include a result from the replacement computer. 16. The one or more non-transitory computer-readable media of claim 15 wherein the instructions further cause: in response to sai

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11455219B2 cover?
Herein are acceleration techniques for resuming offloaded execution by replacing a failed computer with a hot spare computer. In an embodiment, a distributed system configures a DBMS, a set of participating computers, and a set of spare computers. The DBMS receives a query of a database. From the query, an offload query plan is generated for distributed execution. The DBMS sends the offload que…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/24549. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).