Second failure data capture in co-operating multi-image systems

US9852051B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9852051-B2
Application numberUS-201615068832-A
CountryUS
Kind codeB2
Filing dateMar 14, 2016
Priority dateAug 8, 2012
Publication dateDec 26, 2017
Grant dateDec 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer system and computer program captures diagnostic trace information in a computer system having a plurality of software images. Information is received that is associated with a first failure in a first one of the plurality of software images. The received information is distributed to others of the plurality of software images. Further information is captured that is associated with a second failure in another one of the plurality of software images. The information associated with a first failure in a first one of said plurality of software images is combined with the information associated with a second failure in another of said plurality of software images, and the combined information is analyzed in order to determine a cause of the first failure.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer system comprising: a processor; and logic executing on the processor that enables the processor to: check whether one or more of a plurality of software images is executing a same software as a first software image of the plurality of software images; capture a first trace diagnostic information associated with a first failure in the first software image within a log file; distribute the first trace diagnostic information from the log file to others of the plurality of software images; configure, based on the first trace diagnostic information, the others of the plurality of software images to capture a second trace diagnostic information associated with a second failure in another image of the plurality of software images; determine whether a same software component has failed in the first software image and the another one of the plurality of software images; in response to determining the same software component has failed in the first software image and the another one of the plurality of software images, capture a detailed trace diagnostic information for the software component in the another one of the plurality of software images; combine the first trace diagnostic information associated with the first failure with the second trace diagnostic information associated with the second failure; analyze the combined trace diagnostic information determine a cause of the first failure; and identify one or more actions to prevent further failures based on the cause of the first failure. 2. The computer system of claim 1 , wherein: each of the software images further comprises a plurality of processes or threads; the first failure is associated with a first one of the plurality of processes or threads; the distributed information is distributed to others of the plurality of processes or threads; and the trace diagnostic information associated with the second failure is associated with another one of the plurality of processes or threads. 3. The computer system of claim 1 , further comprising at least one of a load balancer, a hypervisor, an operating system, monitoring software, and a peer-to-peer communication mechanism, which distributes the first trace diagnostic information from the log file; wherein the logic for distributing the first trace diagnostic information from the log file to others of the plurality of software images further comprises logic that enables the processor to: distribute a first portion of the first trace diagnostic information from the log file to a first at least one software image of the plurality of software images and distribute a second portion of the first trace diagnostic information from the log file to a second at least one software image of the plurality of software images. 4. The computer system of claim 1 , the logic further comprising logic that when executed by the processor enables the processor to: configure, based on the first information associated with the first failure, the others of the plurality of software images to capture an increased level of trace diagnostic information responsive to a failure. 5. The computer system of claim 4 , wherein the increased level of trace diagnostic information is captured by the others of the plurality of software images for failures occurring within a predetermined time period, the logic further comprising logic that when executed by the processor that enables the processor to: in response to the predetermined time period expiring, revert a level of trace diagnostic information that is captured by the others of the plurality of software images to a second predetermined level. 6. The computer system of claim 4 , wherein the increased level of trace diagnostic information is captured by the others of the plurality of software images for failures occurring within a predetermined time period, the logic further comprising logic that when executed by the processor that enables the processor to: in response to the predetermined time period expiring, revert the increased level of trace diagnostic information that is captured by the others of the plurality of software images to a level of trace diagnostic information established prior to the first failure. 7. The computer system of claim 4 , the logic further comprising logic that, when executed by the processor, enables the processor to: determining whether a predetermined amount of trace diagnostic information has been captured; and in response to determining the predetermined amount of trace diagnostic information has been captured, revert the level of trace diagnostic information that is captured by others of the plurality of software images responsive to a failure to a level of trace diagnostic information established prior to the first failure. 8. The computer system of claim 1 , the logic further comprising logic that, when executed by the processor, enables the processor to: in response to starting at least one of the plurality of software images after a failure, increase a level of trace diagnostic information that is captured for the at least one of the plurality of software images responsive to a subsequent failure. 9. The computer system of claim 1 , the logic further comprising logic that when executed by the processor enables the processor to: load balance the capturing of the second trace diagnostic information across the plurality of software images, wherein each one of the plurality of software images captures at least one of: trace diagnostic information for a particular one or more parts of a software stack and a particular one or more parts of a particular subset of the detailed trace diagnostic information. 10. A non-transitory computer-readable storage device encoded with a computer-readable program for capturing trace diagnostic information, the computer-readable program having code that when executed by a processor in a computer system, enables the processor to: check whether one or more of a plurality of software images is executing a same software as a first software image of the plurality of software images; capture a first trace diagnostic information associated with a first failure in the first software image within a log file; distribute the first trace diagnostic information from the log file to others of the plurality of software images; configure the others of the plurality of software images to capture a second trace diagnostic information associated with a second failure in another image of the plurality of software images; determine whether a same software component has failed in the first software image and the another one of the plurality of software images; in response to determining the same software component has failed in the first software image and the another one of the plurality of software images, capture a detailed trace diagnostic information for the software component in the another one of the plurality of software images; combine the first trace diagnostic information associated with the first failure with the second trace diagnostic information associated with the second failure; analyze the combined trace diagnostic information determine a cause of the first failure; and identify one or more actions to prevent further failures based on the cause of the first failure. 11. The non-transitory computer-readable storage device of claim 10 , wherein: each of the software images further comprises a plurality of processes or threads; the first failure is associated with a first one of the plurality of processes or threads; the distributed trace diagnostic information is distributed to others of the plurality of processes or threads; and t

Assignees

Inventors

Classifications

  • Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • Performance evaluation by tracing or monitoring · CPC title

  • by tracing the execution of the program · CPC title

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

  • in a system implementing multitasking (multitasking per se G06F9/46) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9852051B2 cover?
A computer system and computer program captures diagnostic trace information in a computer system having a plurality of software images. Information is received that is associated with a first failure in a first one of the plurality of software images. The received information is distributed to others of the plurality of software images. Further information is captured that is associated with a…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/3466. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).