Methods, apparatuses, and systems for zero silent data corruption (ZDC) compiler technique
US-10296312-B2 · May 21, 2019 · US
US11449380B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11449380-B2 |
| Application number | US-201916420364-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 23, 2019 |
| Priority date | Jun 6, 2018 |
| Publication date | Sep 20, 2022 |
| Grant date | Sep 20, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for detecting and recovery from a soft error in a computing device is provided. In examples discussed herein, the method can be performed to detect soft errors that may occur during execution of a predefined critical instruction(s) and/or has been propagated in the computing device prior to the execution of the predefined critical instruction(s). Specifically, a software compiler may be used to embed an error detector block(s) after the predefined critical instruction(s). In this regard, the error detector block(s) can be executed after the predefined critical instruction(s) to detect the soft error. Accordingly, it may be possible to invoke a diagnosis routine to determine severity of the detected soft error and take appropriate action against the detected soft error. As such, it may be possible to protect the execution of the predefined critical instruction(s) concurrent to eliminating vulnerable voting intervals and reducing soft error detection overhead.
Opening claim text (preview).
What is claimed is: 1. A method for detecting and recovery from soft errors in a computing device comprising: executing a predefined critical instruction; executing an error detector block subsequent to executing the predefined critical instruction to detect a soft error in the computing device, wherein executing the error detector block comprises: portioning a plurality of programmer-accessible registers in the computing device into at least one master register, at least one detection register, and at least one recovery register; generating the error detector block comprising: a master instruction sequence configured to operate on the at least one master register; a detection instruction sequence configured to operate on the at least one detection register; and a recovery instruction sequence configured to operate on the at least one recovery register; and executing sequentially the master instruction sequence, the detection instruction sequence, and the recovery instruction sequence after executing the predefined critical instruction to detect the soft error associated with executing the predefined critical instruction; and invoking a diagnosis routine and a recovery routine in response to detecting the soft error. 2. The method of claim 1 further comprising executing the error detector block to detect the soft error occurring during or prior to execution of the predefined critical instruction. 3. The method of claim 1 further comprising executing a software compiler to: determine the predefined critical instruction; generate the error detector block corresponding to the predefined critical instruction; generate the diagnosis routine corresponding to the error detector block; and generate an executable program comprising the predefined critical instruction, the error detector block, and the diagnosis routine. 4. The method of claim 1 further comprising: generating the master instruction sequence comprising one or more master instructions selected from a group consisting of: an arithmetic instruction, a memory read instruction, a flow control instruction, a memory write instruction, and a functional call instruction; generating the detection instruction sequence comprising one or more detection instructions selected from a group consisting of: the arithmetic instruction, the memory read instruction, and the flow control instruction; and generating the recovery instruction sequence comprising one or more recovery instructions selected from a group consisting of: the arithmetic instruction and the memory read instruction. 5. The method of claim 1 further comprising invoking the diagnosis routine to: determine whether the detected soft error is recoverable; recover the detected soft error in response to determining that the detected soft error is a recoverable soft error; and alert the detected soft error in response to determining that the detected soft error is a non-recoverable soft error. 6. The method of claim 5 further comprising performing majority-voting among the at least one master register, the at least one detection register, and the at least one recovery register to determine whether the soft error is recoverable. 7. The method of claim 1 further comprising detecting and reacting to a silent-store error occurring during execution of a memory write instruction. 8. The method of claim 7 further comprising: loading a value stored at a destination address associated with the memory write instruction into a silent check register; comparing the silent check register with a master value computed by the master instruction sequence and stored in the at least one master register to determine whether the silent-store error exists; in response to determining that the silent-store error does not exist: writing the master value stored in the at least one master register to the destination address stored in the silent check register; and loading a value stored at a detection destination address into the silent check register; comparing the silent check register with a detection value computed by the detection instruction sequence and stored in the at least one detection register to detect the soft error associated with executing the memory write instruction; and invoking the diagnosis routine in response to detecting the soft error. 9. The method of claim 7 further comprising: loading a value stored at a destination address associated with the memory write instruction into a value check register and a silent check register, respectively; comparing the silent check register with a master value computed by the master instruction sequence and stored in the at least one master register to determine whether the silent-store error exists; copying the silent check register to the value check register; in response to determining that the silent-store error does not exist: writing the master value stored in the at least one master register to the destination address stored in the value check register; and loading a value stored at a detection destination address into the value check register; comparing the silent check register with a detection value computed by the detection instruction sequence and stored in the at least one detection register to detect the soft error associated with executing the memory write instruction; and invoking the diagnosis routine and a recovery routine in response to detecting the soft error. 10. The method of claim 1 further comprising detecting and reacting to a wrong-direction control flow error occurring during execution of a flow control instruction. 11. The method of claim 10 further comprising: executing the flow control instruction based on a predefined branching condition to determine a true-condition branch and a false-condition branch; in the true-condition branch, comparing a first detection register and a second detection register among the at least one detection register based on an opposite of the predefined branching condition to detect the wrong-direction control flow error; in the false-condition branch, comparing the first detection register and the second detection register among the at least one detection register based on the predefined branching condition to detect the wrong-direction control flow error; and invoking the diagnosis routine in response to detecting the wrong-direction control flow error. 12. A non-transitory computer-readable medium (CRM) comprising software with instructions configured to: execute a predefined critical instruction; execute an error detector block subsequent to executing the predefined critical instruction to detect a soft error in a computing device, wherein executing the error detector block comprises: portioning a plurality of programmer-accessible registers in the computing device into at least one master register, at least one detection register, and at least one recovery register; generating the error detector block comprising: a master instruction sequence configured to operate on the at least one master register; a detection instruction sequence configured to operate on the at least one detection register; and a recovery instruction sequence configured to operate on the at least one recovery register; and executing sequentially the master instruction sequence, the detection instruction sequence, and the recovery instruction sequence after executing the predefined critical instruction to detect the soft error associated with executing the predefined critical instruction; and invoke a diagnosis routine in response to detecting the soft error. 13. The non-transitory CRM of claim 12 wherein the so
Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title
the processing taking place on a specific hardware platform or in a specific software environment · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
to protect a block of data words, e.g. CRC or checksum (G06F11/1076 takes precedence; security arrangements for protecting computers or computer systems against unauthorized activity G06F21/00) · CPC title
using recovery blocks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.