Correctable error filtering for input/output subsystem

US10078543B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10078543-B2
Application numberUS-201615167601-A
CountryUS
Kind codeB2
Filing dateMay 27, 2016
Priority dateMay 27, 2016
Publication dateSep 18, 2018
Grant dateSep 18, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A switched fabric hierarchy (e.g., a PCIe hierarchy) may utilize hardware, firmware, and/or software for filtering duplicative or otherwise undesirable correctable error messages from reaching a root complex. An operating system of the root complex may detect a persistent stream or storm of correctable errors from a particular endpoint and activate filtering of correctable errors from that endpoint. A filtering device may receive filtering commands and parameters from the operating system, implement the filtering, and monitor further correctable errors from the offending device. While an offending device is being filtered, correctable error messages from the offending device may be masked from the operating system, while correctable error messages from other devices in the switched fabric hierarchy may be transmitted. At such time as the filtering device may detect that conditions for ending filtering of a device are met, the filtering device may cease filtering of the offending device and return monitoring responsibilities to the operating system.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a computing node configured as a root complex in a switched fabric hierarchy; one or more endpoint nodes configured as endpoints in the switched fabric hierarchy; a correctable error (“CE”) management module configured, at least in part, to: receive a plurality of error messages associated with the one or more endpoint nodes; detect, among the plurality of error messages, a CE storm associated with an offending device, the offending device associated with a first endpoint node of the one or more endpoint nodes; and identify the offending device as a target for CE filtering; and a CE filtering module configured, at least in part, to: at least in part in response to the error management module's identification of the offending device as a target for CE filtering, prevent transmission to the root complex of at least a portion of the plurality of error messages that are associated with the offending device. 2. The system of claim 1 , wherein detecting a CE storm comprises detecting an error event threshold. 3. The system of claim 1 , further comprising a CE containment module configured, at least in part, to: receive a filtering activation command from the CE management module; receive at least one CE containment instruction from the CE management module; and transmit at least one CE filtering command to the CE filtering module. 4. The system of claim 3 , wherein the at least one CE containment instruction comprises at least one of: a routing identifier (“RID”) of the offending device; an indication of how often the CE containment module should query CE events associated with the offending device; and at least one CE management threshold. 5. The system of claim 3 , wherein the at least one CE filtering command comprises at least one of: an instruction to begin filtering CE messages associated with the offending device; an instruction to cease filtering CE messages associated with the offending device. 6. The system of claim 3 , wherein the CE containment module is further configured to: monitor a prevalence of CE events associated with the offending device; and when the prevalence of CE events associated with the offending device is lower than a CE event threshold provided by the CE management module, instruct the CE filtering module to cease filtering CE messages associated with the offending device. 7. The system of claim 3 , wherein the CE management module is implemented within the root complex. 8. The system of claim 3 , wherein the CE containment module is implemented as firmware within the root complex. 9. The system of claim 3 , wherein the CE containment module is implemented as firmware within a filtering device separate from the root complex. 10. The system of claim 3 , wherein the CE filtering module is implemented as hardware within a filtering device separate from the root complex. 11. The system of claim 3 , wherein the CE filtering module is implemented within the offending device. 12. The system of claim 1 , wherein detecting a CE storm comprises detecting repetitive error messages from a particular one of the one or more endpoint nodes. 13. A method, comprising: receiving, by a CE management module, a plurality of error messages associated with the one or more endpoint nodes, wherein the CE management module is associated with a root complex in a switched fabric hierarchy; detecting, by the CE management module, a CE storm associated with an offending device, wherein the offending device is associated with a first endpoint node of one or more endpoint nodes in a switched fabric hierarchy; identifying, by the CE management module, the offending device as a target for CE filtering; filtering, by a CE filtering module, correctable error messages associated with the offending device, the filtering at least in part in response to the error management module's identification of the offending device as a target for CE filtering, wherein the filtering comprises preventing transmission to the root complex of at least a portion of the plurality of error messages that are associated with the offending device. 14. The method of claim 13 , further comprising: by the CE containment module: receiving a filtering activation command from the CE management module; receiving at least one CE containment instruction from the CE management module, the CE containment instruction comprising: an RID of the offending device; a CE event query frequency; and at least one CE management threshold; and transmitting a begin filtering command to the CE filtering module, wherein the begin filtering command instructs the CE filtering module to begin the filtering of CE messages associated with the offending device. 15. The method of claim 14 , further comprising: by the CE containment module: determining a number N, where N represents a number of desired iterations for querying a filter catch signal; querying the filter catch signal associated with the offending device according to the CE event query frequency, the querying occurring at least N times, wherein a CE event counter is incremented in response to receiving an indication, during an iteration of querying of the filter catch signal, that a CE error event has occurred since an immediately previous iteration of querying the filter catch signal; comparing a value of the CE event counter with the value of a CE event threshold; when the value of the CE event counter is lower than the value of the CE event threshold, transmitting a stop filtering command to the CE filtering module, wherein the stop filtering command instructs the CE filtering module to cease the filtering CE messages associated with the offending device. 16. The method of claim 15 , further comprising: transmitting, by the CE containment module to the CE management module, a summary of CE events associated with the offending device. 17. The method of claim 15 , wherein the CE event threshold is one of the at least one CE management thresholds. 18. An apparatus, comprising: one or more endpoint devices configured as endpoints in a switched fabric hierarchy; a computing device configured as a root complex in the switched fabric hierarchy, the computing node comprising: a root complex processor; a root complex memory, the root complex memory comprising program instructions that when executed by the root complex processor cause the processor to: implement a CE management module configured, at least in part, to: receive a plurality of error messages associated with the one or more endpoint nodes; detect, among the plurality of error messages, a CE storm associated with an offending device, the offending device being one of the one or more endpoint devices; identify the offending device as a target for CE filtering; and a filtering device comprising: a filtering processor; a filtering memory, the filtering memory comprising: a plurality of filtering registers associated with the one or more endpoint devices; firmware instructions that when executed by the filtering processor cause the filtering processor to: receive a filtering activation command from the CE management module; begin filtering CE messages from the offending device by manipulating a filtering value of an offending device register to begin filtering CE messages from the offending device, wherein the offending device register is one of the plurality of filtering registers that is associated with the offending device. 19. The apparatus of claim 18 , w

Assignees

Inventors

Classifications

  • Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level · CPC title

  • by exceeding a count or rate limit, e.g. word- or bit count limit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10078543B2 cover?
A switched fabric hierarchy (e.g., a PCIe hierarchy) may utilize hardware, firmware, and/or software for filtering duplicative or otherwise undesirable correctable error messages from reaching a root complex. An operating system of the root complex may detect a persistent stream or storm of correctable errors from a particular endpoint and activate filtering of correctable errors from that endp…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/0781. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 18 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).