Proactive failure handling in data processing systems

US9594620B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9594620-B2
Application numberUS-201615088377-A
CountryUS
Kind codeB2
Filing dateApr 1, 2016
Priority dateApr 4, 2011
Publication dateMar 14, 2017
Grant dateMar 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are directed to predicting the health of a computer node using health report data and to proactively handling failures in computer network nodes. In an embodiment, a computer system monitors various health indicators for multiple nodes in a computer network. The computer system accesses stored health indicators that provide a health history for the computer network nodes. The computer system then generates a health status based on the monitored health factors and the health history. The generated health status indicates the likelihood that the node will be healthy within a specified future time period. The computer system then leverages the generated health status to handle current or predicted failures. The computer system also presents the generated health status to a user or other entity.

First claim

Opening claim text (preview).

We claim: 1. A computer system for predicting the health of a plurality of data processing systems by using health report data, the computer system comprising one or more processors executing computer executable instructions which cause the computer system to perform the following: monitors one or more health indicators for a plurality of data processing systems; accesses one or more stored health indicators that provide a health history for one or more of the monitored plurality of data processing systems; based on both the monitored health indicators and the stored health history, predicts a health status, wherein the predicted health status indicates for at least one of the monitored plurality of data processing systems a probability that the at least one monitored data processing system will be healthy or unhealthy in the future; and presents the predicted health status to a specified entity. 2. The computer system of claim 1 , wherein the computer system further performs the following: makes a determination that the probability that the at least one monitored data processing system will be healthy is below a threshold level; and transfers one or more portions of data stored on the at least one monitored data processing system to one or more other data processing systems in the database system. 3. The computer system of claim 2 , wherein the computer system further performs the following: prevents the at least one monitored data processing system from storing new data. 4. The computer system of claim 2 , wherein making the determination that the probability that the at least one monitored data processing system will be healthy is below a threshold level comprises determining that the at least one monitored data processing system is in a critical state. 5. The computer system of claim 2 , wherein making the determination that the probability that the at least one monitored data processing system will be healthy is below a threshold level comprises determining that the at least one monitored data processing system has experienced one or more failures within a specified time period. 6. The computer system of claim 5 , wherein the computer system further performs the following: assigns an error level to each of the failures. 7. The computer system of claim 6 , wherein the computer system further performs the following: determines that a threshold number of the failures are beyond a specified error level, such that the at least one monitored data processing system is blacklisted. 8. The computer system of claim 7 , wherein monitored data processing systems that have a threshold number of the failures beyond a specified error level are blacklisted, regardless of a monitored data processing system's health history. 9. The computer system of claim 7 , wherein blacklisted data processing systems are put on probation for a specified amount of time to determine whether errors occur during probation. 10. The computer system of claim 9 , wherein the computer system further performs the following: upon determining for a monitored data processing system that was blacklisted that the probationary period is complete and that no further errors have occurred, allowing the monitored data processing system that was blacklisted to continue storing new data and removing the monitored data processing system from the blacklist. 11. The computer system of claim 9 , wherein the computer system further performs the following: upon determining for a monitored data processing system that was blacklisted that the probationary period is complete and that one or more further errors have occurred, preventing the monitored data processing system that was blacklisted from storing new data. 12. The computer system of claim 11 , wherein the computer system further performs the following: relocates data portions that are hosted on the monitored data processing system that is prevented from storing new data. 13. A computer-implemented method for proactively handling failures in a database system comprising a plurality of data processing systems each of which corresponds to a node of the database system, the computer-implemented method being performed by one or more processors executing computer executable instructions for the computer-implemented method, and the computer-implemented method comprising: monitoring one or more health indicators for a plurality of data processing systems each of which corresponds to a node of a database system; accessing at the first data processing system one or more stored health indicators that provide a health history for the one or more of the monitored data processing systems; predicting a health status based on the monitored factors indicators and the health history, wherein the predicted health status indicates the probability that the one or more monitored data processing systems will be healthy or unhealthy in the future; determining, for at least one of the monitored data processing systems, that a threshold number of failures have occurred that are beyond a specified error level; based on the determination, blacklisting the monitored data processing system for which the determination was made; transferring one or more portions of data stored from the monitored data processing system that is blacklisted to one or more of other data processing systems; and preventing the data processing system that was blacklisted from storing new data. 14. The computer-implemented method of claim 13 , wherein the data processing system that is blacklisted is categorized as up and blacklisted, such that the blacklisted data processing system remains used for storing data while the data is transferred to other data processing systems, and no new data is stored on the data processing system that is up and blacklisted. 15. The computer-implemented method of claim 13 , wherein the data processing system that is blacklisted is categorized as down and blacklisted, such that the blacklisted data processing system is no longer used for storing data, data is transferred from the blacklisted data processing system to other data processing systems, and no new data is stored on the data processing system this is down and blacklisted. 16. The computer-implemented method of claim 15 , wherein the data is transferred without waiting for a probationary period. 17. A database system comprising: a plurality of data processing systems; one or more computer-readable storage hardware media, excluding transmission media, and having stored thereon computer-executable instructions that, when executed by one or more processors, cause the database system to be configured with an architecture that proactively handles failures in the plurality of data processing systems by using health report data, and wherein the architecture is configured to perform the following: monitor one or more health indicators for a plurality of data processing systems of the database system; access one or more stored health indicators that provide a health history for one or more of the monitored plurality of data processing systems; based on both the monitored health indicators and the stored health history, predict a health status, wherein the predicted health status indicates for at least one of the monitored plurality of data processing systems a probability that the at least one monitored data processing system will be healthy or unhealthy in the future; and present the predicted health status to a specified entity. 18. The database system of claim 17 , wherein the architecture is fur

Assignees

Inventors

Classifications

  • G06F11/004Primary

    Error avoidance (G06F11/07 and subgroups take precedence) · CPC title

  • Monitoring arrangements determined by the means or processing involved in reporting the monitored data (error or fault reporting or logging G06F11/0766) · CPC title

  • where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title

  • related to network devices · CPC title

  • Display of status information · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9594620B2 cover?
Embodiments are directed to predicting the health of a computer node using health report data and to proactively handling failures in computer network nodes. In an embodiment, a computer system monitors various health indicators for multiple nodes in a computer network. The computer system accesses stored health indicators that provide a health history for the computer network nodes. The comput…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/004. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).