Method and device for managing hardware errors in a multi-core environment

US9658930B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9658930-B2
Application numberUS-201113976030-A
CountryUS
Kind codeB2
Filing dateDec 30, 2011
Priority dateDec 30, 2011
Publication dateMay 23, 2017
Grant dateMay 23, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and device for managing hardware errors in a multi-core environment includes allocating processor cores to a main set and a spare set of processor cores. The main set of processor cores are used by an operating system, and the spare set of processor cores are dedicated to software applications. Should a processor core error occur, a processor core swap may be performed to swap a spare processor core for a failing main processor core without interrupting the execution of the operating system.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computing device comprising: a plurality of processor cores; a memory; firmware logic to (i) allocate at least one of the plurality of processor cores to a spare set of processor cores, (ii) establish a spare processor core hardware description table in the memory that includes each of the allocated processor cores, (iii) establish a main processor core hardware description table in the memory that includes each of the unallocated processor cores, (iv) determine whether a processor core error was caused by an allocated processor core, (v) instruct a software application executed on the computing device using the spare set of processor cores to restart in response to a determination that the processor core error was caused by the allocated processor core, and (vi) assign a new processor core of the spare set of processor cores to the software application in response to the software application being restarted; and processor control logic to control a processor register to indicate whether each processor core of the plurality of processor cores is an allocated or unallocated processor core. 2. The computing device of claim 1 , wherein the processor register includes a register flag usable to denote whether the associated processor core has failed. 3. The computing device of claim 1 , wherein the processor register includes an interrupt register flag useable to cause any hardware interrupt originated from the associated processor core to be broadcast only to other processor cores of the spare set of processor cores and not to any unallocated processor core. 4. The computing device of claim 1 , wherein the processor control logic to broadcast a system management interrupt, generated in response to the processor core error, only to the allocated processor cores of the spare set of processor cores in response to another processor core error being caused by an allocated processor core of the spare set of processor cores. 5. The computing device of claim 1 , wherein the firmware logic to: determine whether the processor core error was caused by an unallocated processor core, determine whether the processor core error is recoverable in response to a determination that the processor core error was caused by an unallocated processor core, increment an error counter associated with the unallocated processor core that caused the processor core error, compare the error counter to a reference threshold, and return control to an operating system executed on the computing device, in response to the error counter being less than the reference threshold, to allow the operating system to reattempt execution of the last software instruction prior to the generation of the processor core error. 6. The computing device of claim 1 , wherein the firmware logic to: determine whether the processor core error was caused by an unallocated processor core, determine whether the processor core error is recoverable in response to a determination that the processor core error was caused by an unallocated processor core, increment an error counter associated with the unallocated processor core that caused the processor core error, compare the error counter to a reference threshold, and perform a processor core swap between the unallocated processor cores and the allocated processor cores in response to the error counter equaling the reference threshold. 7. The computing device of claim 1 , wherein the firmware logic is to: determine whether the processor core error was caused by an unallocated processor core, and replace the unallocated processor core that caused the processor core error with a replacement processor core from the spare set of processor cores without interrupting the execution of an operating system on the computing device in response to a determination that the processor core error was caused by the unallocated processor core. 8. The computing device of claim 7 , wherein to replace the unallocated processor core that caused the processor core error with a replacement processor core comprises to set a virtual core identification number associated with the replacement processor core to a virtual core identification number associated with the unallocated processor core that caused the processor core error. 9. The computing device of claim 7 , wherein to replace the unallocated processor core that caused the processor core error with a replacement processor core comprises to update a processor core register to indicate that the unallocated processor core that caused the processor core error has failed and that the replacement processor core is no longer allocated to the spare set of processor cores. 10. The computing device of claim 7 , wherein to replace the unallocated processor core that caused the processor core error with a replacement processor core comprises to update the main hardware description table to (i) remove the unallocated processor core that caused the processor core error and (ii) add the replacement processor core. 11. One or more non-transitory machine readable media comprising a plurality of instructions that, in response to being executed, cause a computing device to: allocate at least one processor core of a plurality of processor cores of a computing device to a spare set of processor cores; execute an operating system on the computing device using only unallocated processor cores of the plurality of processor cores; execute a software application on the computing device using the spare set of processor cores; detect a processor core error caused by a processor core of the plurality of processor cores; determine whether a processor core error was caused by an allocated processor core; instruct the software application to restart in response to a determination that the processor core error was caused by the allocated processor core; and assign a new processor core of the spare set of processor cores to the software application in response to the software application being restarted. 12. The one or more non-transitory machine readable media of claim 11 , wherein to allocate at least one processor core comprises to establish a register flag for each processor core of the plurality of processor cores usable to denote whether the associated processor core has been allocated to the spare set of processor cores. 13. The one or more non-transitory machine readable media of claim 11 , wherein the plurality of instructions further cause the computing device to: establish a main processor core description table for the unallocated cores and establish a spare processor core description table for the spare set of processor cores, the main processor core description table including only the unallocated cores, wherein to execute the operating system comprises to expose the operating system only to the processor cores listed in the main processor core description table, and wherein to execute the software application comprises to establish a message passing interface between the software application and at least one of the processor cores listed in the spare processor core description table. 14. The one or more non-transitory machine readable media of claim 11 , wherein the plurality of instructions further cause the computing device to: generate a system management interrupt in response a determination that the processor core error was caused by the allocated processor core; and broadcast the system management interrupt only to the allocated processor cores of the spare set of processor cores. 15. The one or more non-transitory machine readable media of claim 11 , whe

Assignees

Inventors

Classifications

  • using centralised failover control functionality · CPC title

  • with more than one idle spare processing component · CPC title

  • where the redundant components share a common memory address space · CPC title

  • without idle spare hardware · CPC title

  • eliminating a faulty processor or activating a spare · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9658930B2 cover?
A method and device for managing hardware errors in a multi-core environment includes allocating processor cores to a main set and a spare set of processor cores. The main set of processor cores are used by an operating system, and the spare set of processor cores are dedicated to software applications. Should a processor core error occur, a processor core swap may be performed to swap a spare …
Who is the assignee on this patent?
Swanson Robert, Bulusu Mallik, Bahnsen Robert B, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06F11/2035. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).