Failure recovery in a scaleout system using a matrix clock

US11704201B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11704201-B2
Application numberUS-202117456993-A
CountryUS
Kind codeB2
Filing dateNov 30, 2021
Priority dateNov 30, 2021
Publication dateJul 18, 2023
Grant dateJul 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One example method includes performing failure recovery operations in a computing system using matrix clocks. Each node or process in a computing system is associated with a matrix clock. As events and transitions occur in the computing systems, the matrix clocks are updated. The matrix clocks provide a chronological and casual view of the computing system and allow a recovery line to be determined in the event of system failure.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: detecting a first event at a first node in a distributed computing system; updating a first matrix clock associated with the first node; transitioning to a second matrix clock at a second node when the second node experiences an event; updating the second matrix clock associated with the second node; detecting a failure in the computing system; and performing a failure recovery in the distributed computing system based on primary vector of the second matrix clock, wherein the primary vector identifies a failure recovery line. 2. The method of claim 1 , further comprising adding the second node to the distributed computing system and performing a snapshot of the second node, wherein the second event comprises the snapshot. 3. The method of claim 2 , wherein updating the second matrix clock includes updating a principal vector of the second matrix clock to reflect the second event and updating a supporting vector of the second matrix clock to include a status of the first node. 4. The method of claim 1 , wherein the first event comprises a snapshot of the first node. 5. The method of claim 1 , wherein updating the first matrix clock includes updating a principal vector to reflect the first event. 6. The method of claim 5 , wherein the principal vector is updated to include a generational number associated with the event. 7. The method of claim 1 , further comprising detecting a failure at the second node and rolling back to a previous snapshot at the first node based on the principal vector of the second node included in the second matrix clock. 8. The method of claim 7 , further comprising replaying a log from a first snapshot from the first node. 9. The method of claim 8 , further comprising synchronizing the first node and the second node to the first snapshot. 10. The method of claim 9 , further comprising performing a cascaded rollback. 11. A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: detecting a first event at a first node in a distributed computing system; updating a first matrix clock associated with the first node; transitioning to a second matrix clock at a second node when the second node experiences an event; updating the second matrix clock associated with the second node; detecting a failure in the computing system; and performing a failure recovery in the distributed computing system based on primary vector of the second matrix clock, wherein the primary vector identifies a failure recovery line. 12. The non-transitory storage medium of claim 11 , further comprising adding the second node to the distributed computing system and performing a snapshot of the second node, wherein the second event comprises the snapshot. 13. The non-transitory storage medium of claim 12 , wherein updating the second matrix clock includes updating a principal vector of the second matrix clock to reflect the second event and updating a supporting vector of the second matrix clock to include a status of the first node. 14. The non-transitory storage medium of claim 11 , wherein the first event comprises a snapshot of the first node. 15. The non-transitory storage medium of claim 11 , wherein updating the first matrix clock includes updating a principal vector to reflect the first event. 16. The non-transitory storage medium of claim 15 , wherein the principal vector is updated to include a generational number associated with the event. 17. The non-transitory storage medium of claim 11 , further comprising detecting a failure at the second node and rolling back to a previous snapshot at the first node based on the principal vector of the second node included in the second matrix clock. 18. The non-transitory storage medium of claim 17 , further comprising replaying a log from a first snapshot from the first node. 19. The non-transitory storage medium of claim 18 , further comprising synchronizing the first node and the second node to the first snapshot. 20. The non-transitory storage medium of claim 19 , further comprising performing a cascaded rollback.

Assignees

Inventors

Classifications

  • Restarting or rejuvenating · CPC title

  • with more than one idle spare processing component · CPC title

  • with a single idle spare processing component · CPC title

  • where the redundant components share neither address space nor persistent storage · CPC title

  • Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11704201B2 cover?
One example method includes performing failure recovery operations in a computing system using matrix clocks. Each node or process in a computing system is associated with a matrix clock. As events and transitions occur in the computing systems, the matrix clocks are updated. The matrix clocks provide a chronological and casual view of the computing system and allow a recovery line to be determ…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F11/1469. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).