Technologies for efficient reliable compute operations for mission critical applications
US-2019138408-A1 · May 9, 2019 · US
US11157374B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11157374-B2 |
| Application number | US-201816234671-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 28, 2018 |
| Priority date | Dec 28, 2018 |
| Publication date | Oct 26, 2021 |
| Grant date | Oct 26, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technologies for efficiently providing reliable compute operations for mission critical applications include a reliability management system. The reliability management system includes circuitry configured to obtain conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system. The conclusion data from each compute device pertains to the same operation. Additionally, the circuitry is configured to identify whether an error has occurred in the operation of each compute device, determine, in response to a determination that an error has occurred, a severity of the error, and cause the host system to perform a responsive action as a function of the determined severity of the error.
Opening claim text (preview).
The invention claimed is: 1. A reliability management system comprising: circuitry to: obtain conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system, wherein the conclusion data from each compute device pertains to the same operation; identify whether an error has occurred in the operation of each compute device, wherein to identify whether an error has occurred comprises to perform a self test of logic or memory of the compute device, and wherein to perform the self test comprises to interleave the self test with conclusion determination operations on the compute device, and wherein the circuitry is to interleave the self test with the conclusion determination operations by saving and restoring states and alternating between the self test and conclusion determination operations; determine, in response to a determination that an error has occurred, a severity of the error at least partially based on results of the self test; and cause the host system to perform a responsive action as a function of the determined severity of the error. 2. The reliability management system of claim 1 , wherein the host system is a vehicle and wherein to obtain the conclusion data comprises to obtain conclusion data indicative of an identification of an object. 3. The reliability management system of claim 1 , wherein the two or fewer compute devices comprises a single compute device. 4. The reliability management system of claim 1 , wherein to identify whether an error has occurred comprises to compare the conclusion data from the two compute devices to identify a difference between the conclusions, wherein the difference is indicative of an error. 5. The reliability management system of claim 4 , wherein to determine the severity of the error comprises to apply one or more filter weights to the identified difference. 6. The reliability management system of claim 5 , wherein the circuitry is further to utilize machine learning to select or adjust the filter weights applied to the identified difference. 7. The reliability management system of claim 1 , wherein to determine the severity of the error comprises to determine that a memory fault or a logic fault identified from the self test is a hard error. 8. The reliability management system of claim 1 , wherein to determine the severity of the error comprises to send data indicative of a result of the self test to a remote compute device for analysis and receive responsive data from the remote compute device indicative of the severity of the error. 9. The reliability management system of claim 1 , wherein to cause the host device to perform a responsive action as a function of the severity of the error comprises to disable, in response to a determination that the error is a soft error that can be recovered from, one or more features of the host device. 10. The reliability management system of claim 9 , wherein to cause the host device to perform a responsive action as a function of the severity of the error comprises to stop movement of the host system and send, to a remote compute device, debug data indicative of a source of the error to a remote compute device for analysis. 11. The reliability management system of claim 10 , wherein to send the debug data comprises to send a tag indicative of the severity of the error and a timestamp indicative of a time when the error occurred. 12. One or more non-transitory machine-readable storage media comprising a plurality of instructions stored thereon that, in response to being executed, cause a reliability management system to: obtain conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system, wherein the conclusion data from each compute device pertains to the same operation; identify whether an error has occurred in the operation of each compute device, wherein to identify whether an error has occurred comprises to perform a self test of logic or memory of the compute device, and wherein to perform the self test comprises to interleave the self test with conclusion determination operations on the compute device, and wherein the instructions are to interleave the self test with the conclusion determination operations by saving and restoring states and alternating between the self test and conclusion determination operations; determine, in response to a determination that an error has occurred, a severity of the error at least partially based on results of the self test; and cause the host system to perform a responsive action as a function of the determined severity of the error. 13. The one or more non-transitory machine-readable storage media of claim 12 , wherein the host system is a vehicle and wherein to obtain the conclusion data comprises to obtain conclusion data indicative of an identification of an object. 14. The one or more non-transitory machine-readable storage media of claim 12 , wherein the two or fewer compute devices comprises a single compute device. 15. The one or more non-transitory machine-readable storage media of claim 12 , wherein to identify whether an error has occurred comprises to compare the conclusion data from the two compute devices to identify a difference between the conclusions, wherein the difference is indicative of an error. 16. The one or more non-transitory machine-readable storage media of claim 15 , wherein to determine the severity of the error comprises to apply one or more filter weights to the identified difference. 17. A method comprising: obtaining, by a reliability management system, conclusion data indicative of a conclusion made by each of two or fewer compute devices of a host system, wherein the conclusion data from each compute device pertains to the same operation; identifying, by the reliability management system, whether an error has occurred in the operation of each compute device, wherein identifying whether an error has occurred comprises performing a self test of logic or memory of the compute device, and wherein performing the self test comprises interleaving the self test with conclusion determination operations on the compute device, and wherein interleaving the self test with the conclusion determination operations comprises to saving and restoring states and alternating between the self test and conclusion determination operations; determining, by the reliability management system and in response to a determination that an error has occurred, a severity of the error at least partially based on results of the self test; and causing, by the reliability management system, the host system to perform a responsive action as a function of the determined severity of the error.
Means for detecting failure or malfunction · CPC title
Transfer function weighting factor · CPC title
Drive control systems specially adapted for autonomous road vehicles · CPC title
Fixing failures by repairing failed parts, e.g. loosening a sticking valve · CPC title
Diagnosing or detecting failures; Failure detection models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.