Implementing enhanced error handling of a shared adapter in a virtualized system

US9304849B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9304849-B2
Application numberUS-201313915943-A
CountryUS
Kind codeB2
Filing dateJun 12, 2013
Priority dateJun 12, 2013
Publication dateApr 5, 2016
Grant dateApr 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system and computer program product are provided for implementing enhanced error handling for a hardware I/O adapter, such as a Single Root Input/Output Virtualization (SRIOV) adapter, in a virtualized system. The hardware I/O adapter is partitioned into multiple endpoints, with each Partitionable Endpoint (PE) corresponding to a function, and there is an adapter PE associated with the entire adapter. The endpoints are managed both independently for actions limited in scope to a single function, and as a group for actions with the scope of the adapter. An error or failure of the adapter PE freezes the adapter PE and propagates to the VF PEs associated with the adapter, causing the VF PEs to be frozen. An adapter driver and VF device drivers are informed of the error, and start recovery. The hypervisor locks out the VF device drivers at key points enabling adapter recovery to successfully complete.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for implementing enhanced error collection for an input/output (I/O) adapter in a computer system, the I/O adapter being partitioned into multiple Partitionable Endpoints, with each Partitionable Endpoint (PE) corresponding to a function, and including an adapter PE associated with the I/O adapter, and multiple virtual function (VF) PEs, said method comprising: responsive to an error of the I/O adapter, freezing the adapter PE; freezing each of the multiple VF PEs associated with the adapter, responsive to freezing the adapter PE; informing an adapter driver and each of a plurality of VF device drivers of the error, and said adapter driver and each of said plurality of VF device drivers starting recovery; each of said plurality of VF device drivers loops attempting to unfreeze respective VF PEs, and locking out said plurality of VF device drivers, enabling adapter recovery to successfully complete; responsive to completed adapter recovery, said plurality of VF device drivers unfreeze respective VF PEs and said plurality of VF device drivers commences recovery. 2. The method as recited in claim 1 , wherein the I/O adapter includes a Single Root Input/Output Virtualization (SRIOV) adapter. 3. The method as recited in claim 1 , includes said adapter driver unfreezing the adapter PE, collecting error data, and starting recovery and reinitialization, and each of the multiple VF PEs remaining frozen. 4. The method as recited in claim 3 , includes said adapter driver recovering the I/O adapter, and restoring a configuration of the I/O adapter. 5. The method as recited in claim 4 , includes the adapter driver providing permission to unfreeze of each of the multiple VF PEs, and each of said plurality of VF device drivers commence recovery. 6. The method as recited in claim 1 , includes said multiple VF device drivers unfreezing each of the multiple VF PE responsive to receiving permission to unfreeze of the VF PEs, and said plurality of VF device drivers commence recovery. 7. The method as recited in claim 6 , includes each said VF device driver collecting error data. 8. The method as recited in claim 6 , includes each said VF device driver completing recovery, and logging error data. 9. The method as recited in claim 8 , includes each of said plurality of VF device drivers resuming normal VF and I/O operations. 10. The method as recited in claim 1 , includes a system hypervisor being notified of the adapter PE and the multiple VF PEs being frozen, and said system hypervisor informing said adapter driver and said plurality of VF device drivers of the error. 11. The method as recited in claim 10 , includes said adapter driver and each of said plurality of VF device drivers asynchronously starting recovery responsive to being informed of the error by said system hypervisor. 12. A system for implementing enhanced error collection for an input/output (I/O) adapter in a computer system, the I/O adapter being partitioned into multiple Partitionable Endpoints, with each Partitionable Endpoint (PE) corresponding to a function, and including an adapter PE associated with the I/O adapter, and multiple virtual function (VF) PEs, said system comprising: a processor; a hypervisor managing functions associated with the hardware I/O adapter; said processor using said hypervisor to perform the steps of: responsive to an error of the I/O adapter, freezing the adapter PE; freezing each of the multiple VF PEs associated with the adapter, responsive to freezing the adapter PE; informing an adapter driver and each of a plurality of VF device drivers of the error, and said adapter driver and each of said plurality of VF device drivers starting recovery; each of said plurality of VF device drivers loops attempting to unfreeze respective VF PEs, and locking out each of said plurality of VF device drivers, enabling adapter recovery to successfully complete; responsive to completed adapter recovery, said plurality of VF device drivers unfreeze respective VF PEs and said plurality of VF device drivers commences recovery. 13. The system as recited in claim 12 , wherein the I/O adapter includes a Single Root Input/Output Virtualization (SRIOV) adapter. 14. The system as recited in claim 12 , includes said adapter driver unfreezing the adapter PE, collecting error data, and starting recovery and reinitialization, and each of the multiple VF PEs remaining frozen. 15. The system as recited in claim 14 , includes said adapter driver recovering the I/O adapter, and restoring a configuration of the I/O adapter. 16. The system as recited in claim 15 , includes the adapter driver providing permission to unfreeze each of the multiple VF PEs, and each of said plurality of VF device drivers commence recovery. 17. The system as recited in claim 12 , includes said multiple VF device drivers unfreezing the VF PEs responsive to receiving permission to unfreeze of the multiple VF PEs, and each of said plurality of VF device drivers commence recovery. 18. The system as recited in claim 17 , includes each said VF device driver completing recovery, and logging error data. 19. The system as recited in claim 18 , includes each of said plurality of VF device drivers resuming normal VF and I/O operations, responsive to completing recovery. 20. The system as recited in claim 12 , includes said adapter driver and each of said plurality of VF device drivers asynchronously starting recovery responsive to being informed of the error.

Assignees

Inventors

Classifications

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • Error detection or correction of the data by redundancy in operations (error detection or correction of the data by redundancy in hardware G06F11/16) · CPC title

  • Dumping, i.e. gathering error/state information after a fault for later diagnosis · CPC title

  • Monitoring or debugging support · CPC title

  • in a virtual computing platform, e.g. logically partitioned systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9304849B2 cover?
A method, system and computer program product are provided for implementing enhanced error handling for a hardware I/O adapter, such as a Single Root Input/Output Virtualization (SRIOV) adapter, in a virtualized system. The hardware I/O adapter is partitioned into multiple endpoints, with each Partitionable Endpoint (PE) corresponding to a function, and there is an adapter PE associated with th…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/0793. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).