Lightweight checkpoint technique for resilience against soft errors

US10997027B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10997027-B2
Application numberUS-201816227514-A
CountryUS
Kind codeB2
Filing dateDec 20, 2018
Priority dateDec 21, 2017
Publication dateMay 4, 2021
Grant dateMay 4, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for implementing a lightweight checkpoint technique for resilience against soft errors are disclosed. The technique provides effective, safe, and timely soft error detection and recovery using software. In an exemplary aspect, resilience against data flow errors and control flow errors is provided in critical or mixed-critical applications in each basic block or at critical basic blocks. Verified register preservation is provided at each basic block, along with memory preservation checkpoints. In this manner, soft errors are quickly detected and addressed. The register and memory preservation further allows for safe re-execution from recoverable soft errors. Control flow errors can also be detected at the beginning and/or end of each basic block.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of detecting and recovering from soft errors of a processing device, comprising: storing an initial state of a first register at a preservation memory address; executing an instruction block comprising a store instruction at a target memory address; before executing the instruction block, determining whether a control flow error occurred in a previous instruction; if the control flow error occurred, executing a control flow diagnosis routine; prior to executing the store instruction, preserving backup data stored in the target memory address in a memory backup register; after preserving the backup data, storing result data from the instruction block at the target memory address; detecting an error in the result data; and in response to detecting the error, recovering from the error. 2. The method of claim 1 , further comprising: identifying a set of registers used in the instruction block; and storing an initial state of each of the set of registers at a corresponding preservation memory address of a set of preservation memory addresses. 3. The method of claim 2 , wherein recovering from the error comprises: restoring the backup data to the target memory address; and restoring the initial state of each of the set of registers. 4. The method of claim 3 , wherein recovering from the error further comprises re-executing the instruction block. 5. The method of claim 1 , further comprising: determining whether the error is recoverable; and if the error is recoverable, restoring the initial state of the first register and restoring the backup data to the target memory address. 6. The method of claim 5 , wherein determining whether the error is recoverable comprises determining the error is recoverable if the result data was stored in the target memory address. 7. The method of claim 5 , further comprising if the error is not recoverable, flagging an unrecoverable error. 8. The method of claim 1 , further comprising verifying the initial state of the first register has been accurately preserved before executing the instruction block. 9. The method of claim 1 , wherein executing the instruction block comprises performing a main instruction thread using a main set of registers and a redundant shadow instruction thread using a shadow set of registers. 10. The method of claim 9 , further comprising comparing the result data stored at the target memory address with shadow result data in a corresponding register of the shadow set of registers. 11. The method of claim 9 , further comprising verifying the initial state of the first register has been accurately preserved before executing the instruction block by: loading from the preservation memory address into the first register; and comparing the first register with a corresponding register of the shadow set of registers. 12. The method of claim 1 , wherein the control flow diagnosis routine comprises: determining whether the control flow error is a wrong direction error; and if the control flow error is the wrong direction error, restoring previously stored program counter data to a program counter register.

Assignees

Inventors

Classifications

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • G06F9/3863Primary

    using multiple copies of the architectural state, e.g. shadow registers · CPC title

  • Restarting or rejuvenating · CPC title

  • using recovery blocks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10997027B2 cover?
Systems and methods for implementing a lightweight checkpoint technique for resilience against soft errors are disclosed. The technique provides effective, safe, and timely soft error detection and recovery using software. In an exemplary aspect, resilience against data flow errors and control flow errors is provided in critical or mixed-critical applications in each basic block or at critical …
Who is the assignee on this patent?
Didehban Moslem, Lokam Sai Ram Dheeraj, Shrivastava Aviral, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3863. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 04 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).