Recording activity of software threads in a concurrent software environment

US9448895B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9448895-B2
Application numberUS-201214110733-A
CountryUS
Kind codeB2
Filing dateApr 16, 2012
Priority dateApr 21, 2011
Publication dateSep 20, 2016
Grant dateSep 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique for failure monitoring and recovery of a first application executing on a first virtual machine includes storing machine state information during execution of the first virtual machine at predetermined checkpoints. An error message that includes an application error state at a failure point of the first application is received, by a hypervisor, from the first application. The first virtual machine is stopped in response to the error message. The hypervisor creates a second virtual machine and a second application from the stored machine state information that are copies of the first virtual machine and the first application. The second virtual machine and the second application are configured to execute from a checkpoint preceding the failure point. In response to receipt of a failure interrupt by the second application, one or more recovery processes are initiated in an attempt to avert the failure point.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing system, comprising: one or more storage devices; and at least one processor coupled to the one or more storage devices, wherein the processor is configured to: execute a first application on a first virtual machine; store machine state information during execution of the first virtual machine at predetermined checkpoints; receive an error message that includes an application error state at a failure point of the first application; stop the first virtual machine in response to the error message; create a copy of the first virtual machine and the first application from the stored machine state information, wherein the virtual machine copy corresponds to a second virtual machine and the application copy corresponds to a second application, and wherein the second virtual machine and the second application are configured to execute from a checkpoint preceding the failure point; send a failure interrupt to the second application before the failure point is reached; and initiate, in response to receipt of the failure interrupt by the second application, one or more recovery processes in an attempt to avert the failure point during execution of the second application. 2. The data processing system of claim 1 , wherein the recovery processes include one or more of cancelling one or more transactions, cancelling one or more threads, flushing one or more caches, discarding one or more data structures, and selecting a safe operating mode. 3. The data processing system of claim 1 , wherein, in response to a failure of the second application coinciding with that of the first application, subsequent recovery processes for the failure are adapted from the one or more recovery processes to increase a probability of avoiding the failure point. 4. The data processing system of claim 3 , wherein the adapted recovery processes include a recovery process with increased impact. 5. The data processing system of claim 3 , wherein the adapted recovery processes include a new recovery process and a new failure causes recovery processes to be selected based on a type of the new failure. 6. The data processing system of claim 1 , wherein the state information includes all virtual machine registers and memory for the first virtual machine. 7. A computer program product, comprising: a non-transitory computer-readable storage memory; and code stored on the computer-readable storage memory, wherein the code, when executed by a data processing system, causes the data processing system to: execute a first application on a first virtual machine; store machine state information during execution of the first virtual machine at predetermined checkpoints; receive an error message that includes an application error state at a failure point of the first application; stop the first virtual machine in response to the error message; create a copy of the first virtual machine and the first application from the stored machine state information, wherein the virtual machine copy corresponds to a second virtual machine and the application copy corresponds to a second application, and wherein the second virtual machine and the second application are configured to execute from a checkpoint preceding the failure point; send a failure interrupt to the second application before the failure point is reached; and initiate, in response to receipt of the failure interrupt by the second application, one or more recovery processes in an attempt to avert the failure point during execution of the second application. 8. The computer program product of claim 7 , wherein the recovery processes include one or more of cancelling one or more transactions, cancelling one or more threads, flushing one or more caches, discarding one or more data structures, and selecting a safe operating mode. 9. The computer program product of claim 7 , wherein, in response to a failure of the second application coinciding with that of the first application, subsequent recovery processes for the failure are adapted from the one or more recovery processes to increase a probability of avoiding the failure point. 10. The computer program product of claim 9 , wherein the adapted recovery processes include a new recovery process and a new failure causes recovery processes to be selected based on a type of the new failure. 11. The computer program product of claim 7 , wherein the state information includes all virtual machine registers and memory for the first virtual machine.

Assignees

Inventors

Classifications

  • Restarting or rejuvenating · CPC title

  • involving virtual machines · CPC title

  • using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements · CPC title

  • Monitoring or debugging support · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9448895B2 cover?
A technique for failure monitoring and recovery of a first application executing on a first virtual machine includes storing machine state information during execution of the first virtual machine at predetermined checkpoints. An error message that includes an application error state at a failure point of the first application is received, by a hypervisor, from the first application. The first …
Who is the assignee on this patent?
North Geraint, IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/1438. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).