System fault detection and processing method, device, and computer readable storage medium

US9720761B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9720761-B2
Application numberUS-201414781403-A
CountryUS
Kind codeB2
Filing dateJan 6, 2014
Priority dateApr 1, 2013
Publication dateAug 1, 2017
Grant dateAug 1, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are a method, a device, and a computer readable storage medium for detecting and processing a system fault. The method includes: an interrupt service routine sending a first stage kicking dog signal, and receiving a second stage kicking dog signal for a system detection task (S 101 ); and when task dead loop or task abnormity is detected, performing system abnormity processing according to a preset processing policy, wherein when the interrupt service routine fails to receive the second stage kicking dog signal within a set period of time, the interrupt service routine stops sending the first stage kicking dog signal, and the system reboots (S 102 ).

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting and processing a system fault, comprising: sending, by an interrupt service routine, a first stage kicking dog signal, and receiving a second stage kicking dog signal for a system detection task; and when task dead loop or task abnormity is detected, performing system abnormity processing according to a preset processing policy, wherein when the interrupt service routine fails to receive the second stage kicking dog signal within a set period of time, the interrupt service routine stops sending the first stage kicking dog signal, and the system reboots. 2. The method according to claim 1 , further comprising: when operating system breakdown or hardware abnormity occurs in the system, automatically rebooting, by the system, for recovery. 3. The method according to claim 1 , wherein when an interrupt exceeds a set threshold, a task with a higher priority over the system detection task is busy, system abnormity occurs during startup of the system or the system detection task suspends due to abnormity in itself, the interrupt service routine fails to receive the second stage kicking dog signal. 4. The method according to claim 3 , wherein the detection of the task dead loop comprises: timing, by the system detection task, a second stage software kicking dog, and timing, by a low priority dead loop auxiliary task, a keep-alive maintaining for the dead loop; calculating CPU occupation ratio periodically; determining whether the calculated CPU occupation ratio is higher than a CPU dead loop determining threshold; in the case that the calculated CPU occupation ratio is not higher than the CPU dead loop determining threshold, determining that no task dead loop occurs; in the case that the calculated CPU occupation ratio is higher than the CPU dead loop determining threshold, determining whether the low priority dead loop auxiliary task is set to be keep-alive; in the case that the low priority dead loop auxiliary task is set to be keep-alive, determining that no task dead loop occurs; otherwise, alarming and notifying maintenance personnel for analysis; and determining whether the system detection task handled only one message during a sample detection period; in the case that the system detection task handled more than one message during the sample detection period, alarming and notifying maintenance personnel for analysis; otherwise, determining that the task dead loop occurs. 5. The method according to claim 1 , wherein the detection of the task abnormity comprises: detecting working state of all tasks periodically; and performing the task abnormity detection according to the detected working state in conjunction with a pre-configured task abnormity determination policy. 6. The method according to claim 2 , wherein when an interrupt exceeds a set threshold, a task with a higher priority over the system detection task is busy, system abnormity occurs during startup of the system or the system detection task suspends due to abnormity in itself, the interrupt service routine fails to receive the second stage kicking dog signal. 7. The method according to claim 2 , wherein detecting the task abnormity comprises: detecting working state of all tasks periodically; and performing the task abnormity detection according to the detected working state in conjunction with a pre-configured task abnormity determination policy. 8. The method according to claim 4 , wherein detecting the task abnormity comprises: detecting working state of all tasks periodically; and performing the task abnormity detection according to the detected working state in conjunction with a pre-configured task abnormity determination policy. 9. A device for detecting and processing a system fault, comprising: a signal processing module configured to cause an interrupt service routine to send a first stage kicking dog signal, and receive a second stage kicking dog signal for a system detection task; and an abnormity processing module configured to, when task dead loop or task abnormity is detected, perform system abnormity processing according to a preset processing policy, wherein, when the interrupt service routine fails to receive the second stage kicking dog signal within a set period of time, the abnormity processing module causes the interrupt service routine to stop sending the first stage kicking dog signal and causes the system to reboot. 10. The device according to claim 9 , further comprising: an automatic reboot module configured to, when operating system breakdown or hardware abnormity occurs in the system, reboot the system automatically for recovery. 11. The device according to claim 9 , wherein when an interrupt exceeds a set threshold, a task with a higher priority over the system detection task is busy, system abnormity occurs during startup of the system or the system detection task suspends due to abnormity in itself, the interrupt service routine fails to receive the second stage kicking dog signal. 12. The device according to claim 11 , further comprising: a CPU occupation ratio calculation module configured to, when the system detection task times a second stage software kicking dog, and a low priority dead loop auxiliary task times keep-alive maintaining for the dead loop, calculate CPU occupation ratio periodically; and a task dead loop detection module configured to, in the case that the system detection task determines that the calculated CPU occupation ratio is not higher than a CPU dead loop determining threshold, determine that no task dead loop occurs; in the case that the system detection task determines that the calculated CPU occupation ratio is higher than the CPU dead loop determining threshold, determine whether the low priority dead loop auxiliary task is set to be keep-alive; in the case that the low priority dead loop auxiliary task is set to be keep-alive, determine that no task dead loop occurs; otherwise, alarm and notify maintenance personnel for analysis; and wherein the task dead loop detection module is further configured to determine whether the system detection task handled only one message during a sample detection period; in the case that the system detection task handled more than one message during the sample detection period, alarm and notify maintenance personnel for analysis; otherwise, determine that the task dead loop occurs. 13. The device according to claim 9 , further comprising: a task working state detection module configured to detect working state of all tasks periodically; and a task abnormity detection module configured to perform the task abnormity detection according to the detected working state in conjunction with a pre-configured task abnormity determination policy. 14. The device according to claim 10 , wherein when an interrupt exceeds a set threshold, a task with a higher priority over the system detection task is busy, system abnormity occurs during startup of the system or the system detection task suspends due to abnormity in itself, the interrupt service routine fails to receive the second stage kicking dog signal. 15. The device according to claim 10 , further comprising: a task working state detection module configured to detect working state of all tasks periodically; and a task abnormity detection module configured to perform the task abnormity detection according to the detected working state in conjunction with a pre-configured task abnormity determination policy. 16. The device according to claim 12 , further comprising: a task working state detection module configured to detect working s

Assignees

Inventors

Classifications

  • in a system implementing multitasking (multitasking per se G06F9/46) · CPC title

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

  • by exceeding a time limit, i.e. time-out, e.g. watchdogs · CPC title

  • Active fault masking without idle spares · CPC title

  • G06F11/079Primary

    Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9720761B2 cover?
Disclosed are a method, a device, and a computer readable storage medium for detecting and processing a system fault. The method includes: an interrupt service routine sending a first stage kicking dog signal, and receiving a second stage kicking dog signal for a system detection task (S 101 ); and when task dead loop or task abnormity is detected, performing system abnormity processing accordi…
Who is the assignee on this patent?
Zte Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/0715. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).