Technologies for efficient reliable compute operations for mission critical applications

US11157374B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11157374-B2
Application numberUS-201816234671-A
CountryUS
Kind codeB2
Filing dateDec 28, 2018
Priority dateDec 28, 2018
Publication dateOct 26, 2021
Grant dateOct 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies for efficiently providing reliable compute operations for mission critical applications include a reliability management system. The reliability management system includes circuitry configured to obtain conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system. The conclusion data from each compute device pertains to the same operation. Additionally, the circuitry is configured to identify whether an error has occurred in the operation of each compute device, determine, in response to a determination that an error has occurred, a severity of the error, and cause the host system to perform a responsive action as a function of the determined severity of the error.

First claim

Opening claim text (preview).

The invention claimed is: 1. A reliability management system comprising: circuitry to: obtain conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system, wherein the conclusion data from each compute device pertains to the same operation; identify whether an error has occurred in the operation of each compute device, wherein to identify whether an error has occurred comprises to perform a self test of logic or memory of the compute device, and wherein to perform the self test comprises to interleave the self test with conclusion determination operations on the compute device, and wherein the circuitry is to interleave the self test with the conclusion determination operations by saving and restoring states and alternating between the self test and conclusion determination operations; determine, in response to a determination that an error has occurred, a severity of the error at least partially based on results of the self test; and cause the host system to perform a responsive action as a function of the determined severity of the error. 2. The reliability management system of claim 1 , wherein the host system is a vehicle and wherein to obtain the conclusion data comprises to obtain conclusion data indicative of an identification of an object. 3. The reliability management system of claim 1 , wherein the two or fewer compute devices comprises a single compute device. 4. The reliability management system of claim 1 , wherein to identify whether an error has occurred comprises to compare the conclusion data from the two compute devices to identify a difference between the conclusions, wherein the difference is indicative of an error. 5. The reliability management system of claim 4 , wherein to determine the severity of the error comprises to apply one or more filter weights to the identified difference. 6. The reliability management system of claim 5 , wherein the circuitry is further to utilize machine learning to select or adjust the filter weights applied to the identified difference. 7. The reliability management system of claim 1 , wherein to determine the severity of the error comprises to determine that a memory fault or a logic fault identified from the self test is a hard error. 8. The reliability management system of claim 1 , wherein to determine the severity of the error comprises to send data indicative of a result of the self test to a remote compute device for analysis and receive responsive data from the remote compute device indicative of the severity of the error. 9. The reliability management system of claim 1 , wherein to cause the host device to perform a responsive action as a function of the severity of the error comprises to disable, in response to a determination that the error is a soft error that can be recovered from, one or more features of the host device. 10. The reliability management system of claim 9 , wherein to cause the host device to perform a responsive action as a function of the severity of the error comprises to stop movement of the host system and send, to a remote compute device, debug data indicative of a source of the error to a remote compute device for analysis. 11. The reliability management system of claim 10 , wherein to send the debug data comprises to send a tag indicative of the severity of the error and a timestamp indicative of a time when the error occurred. 12. One or more non-transitory machine-readable storage media comprising a plurality of instructions stored thereon that, in response to being executed, cause a reliability management system to: obtain conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system, wherein the conclusion data from each compute device pertains to the same operation; identify whether an error has occurred in the operation of each compute device, wherein to identify whether an error has occurred comprises to perform a self test of logic or memory of the compute device, and wherein to perform the self test comprises to interleave the self test with conclusion determination operations on the compute device, and wherein the instructions are to interleave the self test with the conclusion determination operations by saving and restoring states and alternating between the self test and conclusion determination operations; determine, in response to a determination that an error has occurred, a severity of the error at least partially based on results of the self test; and cause the host system to perform a responsive action as a function of the determined severity of the error. 13. The one or more non-transitory machine-readable storage media of claim 12 , wherein the host system is a vehicle and wherein to obtain the conclusion data comprises to obtain conclusion data indicative of an identification of an object. 14. The one or more non-transitory machine-readable storage media of claim 12 , wherein the two or fewer compute devices comprises a single compute device. 15. The one or more non-transitory machine-readable storage media of claim 12 , wherein to identify whether an error has occurred comprises to compare the conclusion data from the two compute devices to identify a difference between the conclusions, wherein the difference is indicative of an error. 16. The one or more non-transitory machine-readable storage media of claim 15 , wherein to determine the severity of the error comprises to apply one or more filter weights to the identified difference. 17. A method comprising: obtaining, by a reliability management system, conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system, wherein the conclusion data from each compute device pertains to the same operation; identifying, by the reliability management system, whether an error has occurred in the operation of each compute device, wherein identifying whether an error has occurred comprises performing a self test of logic or memory of the compute device, and wherein performing the self test comprises interleaving the self test with conclusion determination operations on the compute device, and wherein interleaving the self test with the conclusion determination operations comprises to saving and restoring states and alternating between the self test and conclusion determination operations; determining, by the reliability management system and in response to a determination that an error has occurred, a severity of the error at least partially based on results of the self test; and causing, by the reliability management system, the host system to perform a responsive action as a function of the determined severity of the error.

Assignees

Inventors

Classifications

  • Means for detecting failure or malfunction · CPC title

  • Transfer function weighting factor · CPC title

  • Drive control systems specially adapted for autonomous road vehicles · CPC title

  • Fixing failures by repairing failed parts, e.g. loosening a sticking valve · CPC title

  • Diagnosing or detecting failures; Failure detection models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11157374B2 cover?
Technologies for efficiently providing reliable compute operations for mission critical applications include a reliability management system. The reliability management system includes circuitry configured to obtain conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system. The conclusion data from each compute device pertains to the same operation…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification B60W50/0205. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Oct 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).