Monitoring correctable errors on a bus interface to determine whether to redirect input/output request (I/O) traffic to another bus interface
US-10528437-B2 · Jan 7, 2020 · US
US10949277B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10949277-B2 |
| Application number | US-201916507017-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 9, 2019 |
| Priority date | Jun 2, 2017 |
| Publication date | Mar 16, 2021 |
| Grant date | Mar 16, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided are a computer program product for managing bus interface errors in a storage system coupled to a host and storage. A determination is made as to whether a first number of correctable errors on a first bus interface, connecting a first processing unit to the storage, exceeds a second number of correctable errors on a second bus interface, connecting a second processing unit to the storage, by a difference threshold. The correctable errors in the first and second bus interfaces are detected and corrected in the first and second bus interfaces by first hardware and second hardware, respectively. In response to determining that the first number of correctable errors exceeds the second number of correctable errors by the difference threshold, at least a portion of Input/Output (I/O) requests are redirected to a second processing unit using the second bus interface to connect to the storage.
Opening claim text (preview).
What is claimed is: 1. A computer program product implemented in a host adaptor to manage I/O requests from a host to a storage, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising: determining a first number of errors on a first bus interface that connects the host adaptor and a first device adaptor to a first processing unit and a second processing unit, wherein the first device adaptor comprises a default device adaptor the first processing unit uses to send Input/Output (I/O) requests to the storage; determining a second number of errors on a second bus interface that connects a second device adaptor to the first processing unit and the second processing unit, wherein the second device adaptor comprises a default device adaptor the second processing unit uses to send I/O requests to the storage; determining whether to redirect a portion of the I/O requests, received from the host at the host adaptor, from the first processing unit to the second processing unit based on the first number of errors and the second number of errors; and redirecting the portion of the I/O requests to the second processing unit in response to determining to redirect the portion of the I/O requests, wherein the second processing unit sends the portion of the I/O requests redirected to the second device adaptor to send to the storage. 2. The computer program product of claim 1 , wherein the operations further comprise: determining whether the first number of errors exceed an error threshold indicating to perform a failover, wherein the determining whether to redirect the portion of the I/O requests and redirecting the portion of the I/O requests are performed in response to detecting that the first number of errors exceeds the error threshold. 3. The computer program product of claim 1 , wherein the determining whether to redirect the portion of the I/O requests based on the first number of errors and the second number of errors comprises determining whether a difference of the first number of errors and the second number of errors exceed a difference threshold. 4. The computer program product of claim 3 , wherein the operations further comprise: determining whether the second bus interface has not yet been considered for redirecting in response to determining that the first number of errors exceeds the difference threshold, wherein the determining whether to redirect the portion of the I/O requests is made in response to determining that the difference exceeds the difference threshold. 5. The computer program product of claim 1 , wherein the first number and the second number of errors comprise correctable errors not affecting reliability of the first and the second bus interfaces and data integrity of data transmitted over the first and the second bus interfaces. 6. The computer program product of claim 1 , wherein the operations further comprise: initiating load balance operations to redirect the portion of the I/O requests from the first processing unit to the second processing unit, wherein the determining the first number of errors, the second number of errors, whether to redirect the portion of the I/O requests, and the redirecting the portion of the I/O requests are performed in response to initiating the load balancing operations. 7. The computer program product of claim 1 , wherein the first bus interface, the host adaptor and the first device adaptor are included in a first I/O bay and wherein the second bus interface and the second device adaptor are included in a second I/O bay. 8. A system for managing bus interface errors in a storage system coupled to a host and a storage, comprising: a first processing unit; a second processing unit; a first device adaptor comprising a default device adaptor for the first processing unit to connect to the storage; a second device adaptor comprising a default device adaptor for the second processing unit to connect to the storage; a host adaptor; a first bus interface connecting the first device adaptor and the host adaptor to the first and the second processing units; a second bus interface connecting the second device adaptor to the first and the second processing units; wherein the host adaptor performs operations, the operations comprising: determining a first number of errors on the first bus interface; determining a second number of errors on the second bus interface; determining whether to redirect a portion of Input/Output (I/O) requests, received from the host at the host adaptor, from the first processing unit to the second processing unit based on the first number of errors and the second number of errors; and redirecting the portion of the I/O requests to the second processing unit in response to determining to redirect the portion of the I/O requests, wherein the second processing unit sends the portion of the I/O requests redirected to the second device adaptor to send to the storage. 9. The system of claim 8 , wherein the operations further comprise: determining whether the first number of errors exceed an error threshold indicating to perform a failover, wherein the determining whether to redirect the portion of the I/O requests and redirecting the portion of the I/O requests are performed in response to detecting that the first number of errors exceeds the error threshold. 10. The system of claim 8 , wherein the determining whether to redirect the portion of the I/O requests based on the first number of errors and the second number of errors comprises determining whether a difference of the first number of errors and the second number of errors exceed a difference threshold. 11. The system of claim 10 , wherein the operations further comprise: determining whether the second bus interface has not yet been considered for redirecting in response to determining that the first number of errors exceeds the difference threshold, wherein the determining whether to redirect the portion of the I/O requests is made in response to determining that that difference exceeds the difference threshold. 12. The system of claim 8 , wherein the first number and the second number of errors comprise correctable errors not affecting reliability of the first and the second bus interfaces and data integrity of data transmitted over the first and the second bus interfaces. 13. The system of claim 8 , wherein the operations further comprise: initiating load balance operations to redirect the portion of the I/O requests from the first processing unit to the second processing unit, wherein the determining the first number of errors, the second number of errors, whether to redirect the portion of the I/O requests, and the redirecting the portion of the I/O requests are performed in response to initiating the load balancing operations. 14. The system of claim 8 , wherein the first bus interface, the host adaptor and the first device adaptor are included in a first I/O bay and wherein the second bus interface and the second device adaptor are included in a second I/O bay. 15. A method implemented in a host adaptor to manage I/O requests from a host to a storage, comprising: determining a first number of errors on a first bus interface that connects the host adaptor and a first device adaptor to a first processing unit and a second processing unit, wherein the first device adaptor comprises a default device adaptor the first processing unit uses to send Input/Output (I/O) requests to the storage; determining a second number of errors on a
in an input/output transactions management context (input/output processing in general G06F13/00) · CPC title
by exceeding a count or rate limit, e.g. word- or bit count limit · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
Threshold · CPC title
in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.