Method for detecting and recovery from soft errors in a computing device

US11449380B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11449380-B2
Application numberUS-201916420364-A
CountryUS
Kind codeB2
Filing dateMay 23, 2019
Priority dateJun 6, 2018
Publication dateSep 20, 2022
Grant dateSep 20, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for detecting and recovery from a soft error in a computing device is provided. In examples discussed herein, the method can be performed to detect soft errors that may occur during execution of a predefined critical instruction(s) and/or has been propagated in the computing device prior to the execution of the predefined critical instruction(s). Specifically, a software compiler may be used to embed an error detector block(s) after the predefined critical instruction(s). In this regard, the error detector block(s) can be executed after the predefined critical instruction(s) to detect the soft error. Accordingly, it may be possible to invoke a diagnosis routine to determine severity of the detected soft error and take appropriate action against the detected soft error. As such, it may be possible to protect the execution of the predefined critical instruction(s) concurrent to eliminating vulnerable voting intervals and reducing soft error detection overhead.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting and recovery from soft errors in a computing device comprising: executing a predefined critical instruction; executing an error detector block subsequent to executing the predefined critical instruction to detect a soft error in the computing device, wherein executing the error detector block comprises: portioning a plurality of programmer-accessible registers in the computing device into at least one master register, at least one detection register, and at least one recovery register; generating the error detector block comprising: a master instruction sequence configured to operate on the at least one master register; a detection instruction sequence configured to operate on the at least one detection register; and a recovery instruction sequence configured to operate on the at least one recovery register; and executing sequentially the master instruction sequence, the detection instruction sequence, and the recovery instruction sequence after executing the predefined critical instruction to detect the soft error associated with executing the predefined critical instruction; and invoking a diagnosis routine and a recovery routine in response to detecting the soft error. 2. The method of claim 1 further comprising executing the error detector block to detect the soft error occurring during or prior to execution of the predefined critical instruction. 3. The method of claim 1 further comprising executing a software compiler to: determine the predefined critical instruction; generate the error detector block corresponding to the predefined critical instruction; generate the diagnosis routine corresponding to the error detector block; and generate an executable program comprising the predefined critical instruction, the error detector block, and the diagnosis routine. 4. The method of claim 1 further comprising: generating the master instruction sequence comprising one or more master instructions selected from a group consisting of: an arithmetic instruction, a memory read instruction, a flow control instruction, a memory write instruction, and a functional call instruction; generating the detection instruction sequence comprising one or more detection instructions selected from a group consisting of: the arithmetic instruction, the memory read instruction, and the flow control instruction; and generating the recovery instruction sequence comprising one or more recovery instructions selected from a group consisting of: the arithmetic instruction and the memory read instruction. 5. The method of claim 1 further comprising invoking the diagnosis routine to: determine whether the detected soft error is recoverable; recover the detected soft error in response to determining that the detected soft error is a recoverable soft error; and alert the detected soft error in response to determining that the detected soft error is a non-recoverable soft error. 6. The method of claim 5 further comprising performing majority-voting among the at least one master register, the at least one detection register, and the at least one recovery register to determine whether the soft error is recoverable. 7. The method of claim 1 further comprising detecting and reacting to a silent-store error occurring during execution of a memory write instruction. 8. The method of claim 7 further comprising: loading a value stored at a destination address associated with the memory write instruction into a silent check register; comparing the silent check register with a master value computed by the master instruction sequence and stored in the at least one master register to determine whether the silent-store error exists; in response to determining that the silent-store error does not exist: writing the master value stored in the at least one master register to the destination address stored in the silent check register; and loading a value stored at a detection destination address into the silent check register; comparing the silent check register with a detection value computed by the detection instruction sequence and stored in the at least one detection register to detect the soft error associated with executing the memory write instruction; and invoking the diagnosis routine in response to detecting the soft error. 9. The method of claim 7 further comprising: loading a value stored at a destination address associated with the memory write instruction into a value check register and a silent check register, respectively; comparing the silent check register with a master value computed by the master instruction sequence and stored in the at least one master register to determine whether the silent-store error exists; copying the silent check register to the value check register; in response to determining that the silent-store error does not exist: writing the master value stored in the at least one master register to the destination address stored in the value check register; and loading a value stored at a detection destination address into the value check register; comparing the silent check register with a detection value computed by the detection instruction sequence and stored in the at least one detection register to detect the soft error associated with executing the memory write instruction; and invoking the diagnosis routine and a recovery routine in response to detecting the soft error. 10. The method of claim 1 further comprising detecting and reacting to a wrong-direction control flow error occurring during execution of a flow control instruction. 11. The method of claim 10 further comprising: executing the flow control instruction based on a predefined branching condition to determine a true-condition branch and a false-condition branch; in the true-condition branch, comparing a first detection register and a second detection register among the at least one detection register based on an opposite of the predefined branching condition to detect the wrong-direction control flow error; in the false-condition branch, comparing the first detection register and the second detection register among the at least one detection register based on the predefined branching condition to detect the wrong-direction control flow error; and invoking the diagnosis routine in response to detecting the wrong-direction control flow error. 12. A non-transitory computer-readable medium (CRM) comprising software with instructions configured to: execute a predefined critical instruction; execute an error detector block subsequent to executing the predefined critical instruction to detect a soft error in a computing device, wherein executing the error detector block comprises: portioning a plurality of programmer-accessible registers in the computing device into at least one master register, at least one detection register, and at least one recovery register; generating the error detector block comprising: a master instruction sequence configured to operate on the at least one master register; a detection instruction sequence configured to operate on the at least one detection register; and a recovery instruction sequence configured to operate on the at least one recovery register; and executing sequentially the master instruction sequence, the detection instruction sequence, and the recovery instruction sequence after executing the predefined critical instruction to detect the soft error associated with executing the predefined critical instruction; and invoke a diagnosis routine in response to detecting the soft error. 13. The non-transitory CRM of claim 12 wherein the so

Assignees

Inventors

Classifications

  • G06F11/079Primary

    Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • the processing taking place on a specific hardware platform or in a specific software environment · CPC title

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • to protect a block of data words, e.g. CRC or checksum (G06F11/1076 takes precedence; security arrangements for protecting computers or computer systems against unauthorized activity G06F21/00) · CPC title

  • using recovery blocks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11449380B2 cover?
A method for detecting and recovery from a soft error in a computing device is provided. In examples discussed herein, the method can be performed to detect soft errors that may occur during execution of a predefined critical instruction(s) and/or has been propagated in the computing device prior to the execution of the predefined critical instruction(s). Specifically, a software compiler may b…
Who is the assignee on this patent?
Didehban Moslem, Shrivastava Aviral, Lokam Sai Ram Dheeraj, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F11/079. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).