Apparatus, method and computer program product for error correction in variably reliable and/or hierarchical networks
US-9621934-B2 · Apr 11, 2017 · US
US10402261B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10402261-B2 |
| Application number | US-201515500080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 31, 2015 |
| Priority date | Mar 31, 2015 |
| Publication date | Sep 3, 2019 |
| Grant date | Sep 3, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An example device in accordance with an aspect of the present disclosure includes a redundancy controller and/or memory module to prevent data corruption and single point of failure in a fault-tolerant memory fabric. Devices include engines to issue and/or respond to primitive requests, identify failures and/or fault conditions, and receive and/or issue containment mode indications.
Opening claim text (preview).
What is claimed is: 1. A redundancy controller to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of memory modules, the redundancy controller comprising: a normal mode engine to issue a primitive request to a memory module; a request timeout engine to identify the memory module as failed in response to at least one of i) receiving, from the memory module, a containment mode indication responsive to the primitive request, and ii) expiration of a timeout associated with not receiving a response to the primitive request; and a degraded mode engine to issue primitive requests to remaining memory modules not identified as failed, according to a degraded mode, wherein reads of data located on a failed memory module by the degraded mode engine use parity-reconstruction to reconstruct data on the failed memory module from surviving memory modules serving a stripe and wherein writes to data located on the failed memory module use parity reconstruction to reconstruct lost pre-write data, followed by using the reconstructed lost pre-write data for a new parity value to be written to a healthy memory model that holds parity of the stripe. 2. The redundancy controller of claim 1 , further comprising a journaling engine to, in response to the given memory module being identified as failed, record the given memory module as failed in at least one journal. 3. The redundancy controller of claim 2 , wherein prior to mounting a redundant array of independent disks (RAID) grouping of the plurality of memory modules, the redundancy controller is to examine the at least one journal associated with the RAID grouping to identify whether one or more redundancy controllers associated with the RAID grouping had previously entered degraded mode, and if so, mount the RAID grouping directly in degraded mode. 4. The redundancy controller of claim 2 , wherein the journaling engine is to store the at least one journal on a reserved persistent portion of a corresponding at least one memory module. 5. The redundancy controller of claim 4 , wherein the at least one journal includes metadata pertaining to memory of the corresponding at least one memory module, such that the at least one journal remains with its corresponding at least one memory even if mechanically removed from the at least one memory module. 6. The redundancy controller of claim 4 , wherein the journaling engine is to update a plurality of journals across a plurality of memory modules. 7. The redundancy controller of claim 6 , wherein the journaling engine is to synchronously update the plurality of journals subsequent to identifying the given memory module as failed in a RAID group, but prior to performing degraded-mode access to the plurality of memory modules. 8. The redundancy controller of claim 1 , wherein the degraded mode engine is to issue primitive requests to remaining not-failed memory modules according to the degraded mode, for data associated with a RAID stripe spanning across a plurality of memory modules. 9. A memory module to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of redundancy controllers, the memory module comprising: a normal mode engine to respond to primitive requests from redundancy controllers; a fault condition engine to identify a fault condition with the memory module; and a containment mode engine to issue, subsequent to the fault condition having been identified by the fault condition engine, containment mode indications in response to primitive requests received from redundancy controllers, wherein the containment mode indications are transmitted to the plurality of redundancy controllers so as to coordinate entry of redundancy controllers into degraded mode. 10. The memory module of claim 9 , wherein the fault condition engine is to identify the fault condition based on actively detecting a failure mode associated with increased risk of at least one of intermittent failure, transient failure, and address-dependent failure of the memory module. 11. The memory module of claim 9 , wherein the memory module includes a media controller and a memory, and the fault condition is associated with a fault in at least one of i) the memory and ii) a connection between the memory and the media controller. 12. The memory module of claim 11 , further comprising a journal stored on a reserved portion of the memory, indicative of a status of at least one memory module. 13. The memory module of claim 12 , wherein the journal is replicated across a plurality of memory modules, and the journal of a given memory module contains information regarding a status of the plurality of memory modules. 14. A redundancy controller to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of memory modules, the redundancy controller comprising: a normal mode engine to issue a primitive request to a memory module; a request timeout engine to identify the memory module as failed in response to at least one of i) receiving, from the memory module, a containment mode indication responsive to the primitive request, and ii) expiration of a timeout associated with not receiving a response to the primitive request; and a degraded mode engine to issue primitive requests to remaining memory modules not identified as failed, according to a degraded mode; a journaling engine to, in response to the given memory module being identified as failed, record the given memory module as failed in at least one journal, wherein prior to mounting a redundant array of independent disks (RAID) grouping of the plurality of memory modules, the redundancy controller is to examine the at least one journal associated with the RAID grouping to identify whether one or more redundancy controllers associated with the RAID grouping had previously entered degraded mode, and if so, mount the RAID grouping directly in degraded mode.
Techniques of failing over between control units · CPC title
where memory access, memory control or I/O control functionality is redundant (redundant communication control functionality G06F11/2005; redundant storage control functionality G06F11/2089) · CPC title
Parity data used in redundant arrays of independent storages, e.g. in RAID systems · CPC title
Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title
in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.