Preventing data corruption and single point of failure in fault-tolerant memory fabrics

US10402261B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10402261-B2
Application numberUS-201515500080-A
CountryUS
Kind codeB2
Filing dateMar 31, 2015
Priority dateMar 31, 2015
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example device in accordance with an aspect of the present disclosure includes a redundancy controller and/or memory module to prevent data corruption and single point of failure in a fault-tolerant memory fabric. Devices include engines to issue and/or respond to primitive requests, identify failures and/or fault conditions, and receive and/or issue containment mode indications.

First claim

Opening claim text (preview).

What is claimed is: 1. A redundancy controller to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of memory modules, the redundancy controller comprising: a normal mode engine to issue a primitive request to a memory module; a request timeout engine to identify the memory module as failed in response to at least one of i) receiving, from the memory module, a containment mode indication responsive to the primitive request, and ii) expiration of a timeout associated with not receiving a response to the primitive request; and a degraded mode engine to issue primitive requests to remaining memory modules not identified as failed, according to a degraded mode, wherein reads of data located on a failed memory module by the degraded mode engine use parity-reconstruction to reconstruct data on the failed memory module from surviving memory modules serving a stripe and wherein writes to data located on the failed memory module use parity reconstruction to reconstruct lost pre-write data, followed by using the reconstructed lost pre-write data for a new parity value to be written to a healthy memory model that holds parity of the stripe. 2. The redundancy controller of claim 1 , further comprising a journaling engine to, in response to the given memory module being identified as failed, record the given memory module as failed in at least one journal. 3. The redundancy controller of claim 2 , wherein prior to mounting a redundant array of independent disks (RAID) grouping of the plurality of memory modules, the redundancy controller is to examine the at least one journal associated with the RAID grouping to identify whether one or more redundancy controllers associated with the RAID grouping had previously entered degraded mode, and if so, mount the RAID grouping directly in degraded mode. 4. The redundancy controller of claim 2 , wherein the journaling engine is to store the at least one journal on a reserved persistent portion of a corresponding at least one memory module. 5. The redundancy controller of claim 4 , wherein the at least one journal includes metadata pertaining to memory of the corresponding at least one memory module, such that the at least one journal remains with its corresponding at least one memory even if mechanically removed from the at least one memory module. 6. The redundancy controller of claim 4 , wherein the journaling engine is to update a plurality of journals across a plurality of memory modules. 7. The redundancy controller of claim 6 , wherein the journaling engine is to synchronously update the plurality of journals subsequent to identifying the given memory module as failed in a RAID group, but prior to performing degraded-mode access to the plurality of memory modules. 8. The redundancy controller of claim 1 , wherein the degraded mode engine is to issue primitive requests to remaining not-failed memory modules according to the degraded mode, for data associated with a RAID stripe spanning across a plurality of memory modules. 9. A memory module to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of redundancy controllers, the memory module comprising: a normal mode engine to respond to primitive requests from redundancy controllers; a fault condition engine to identify a fault condition with the memory module; and a containment mode engine to issue, subsequent to the fault condition having been identified by the fault condition engine, containment mode indications in response to primitive requests received from redundancy controllers, wherein the containment mode indications are transmitted to the plurality of redundancy controllers so as to coordinate entry of redundancy controllers into degraded mode. 10. The memory module of claim 9 , wherein the fault condition engine is to identify the fault condition based on actively detecting a failure mode associated with increased risk of at least one of intermittent failure, transient failure, and address-dependent failure of the memory module. 11. The memory module of claim 9 , wherein the memory module includes a media controller and a memory, and the fault condition is associated with a fault in at least one of i) the memory and ii) a connection between the memory and the media controller. 12. The memory module of claim 11 , further comprising a journal stored on a reserved portion of the memory, indicative of a status of at least one memory module. 13. The memory module of claim 12 , wherein the journal is replicated across a plurality of memory modules, and the journal of a given memory module contains information regarding a status of the plurality of memory modules. 14. A redundancy controller to prevent data corruption and single point of failure in a fault-tolerant memory fabric with a plurality of memory modules, the redundancy controller comprising: a normal mode engine to issue a primitive request to a memory module; a request timeout engine to identify the memory module as failed in response to at least one of i) receiving, from the memory module, a containment mode indication responsive to the primitive request, and ii) expiration of a timeout associated with not receiving a response to the primitive request; and a degraded mode engine to issue primitive requests to remaining memory modules not identified as failed, according to a degraded mode; a journaling engine to, in response to the given memory module being identified as failed, record the given memory module as failed in at least one journal, wherein prior to mounting a redundant array of independent disks (RAID) grouping of the plurality of memory modules, the redundancy controller is to examine the at least one journal associated with the RAID grouping to identify whether one or more redundancy controllers associated with the RAID grouping had previously entered degraded mode, and if so, mount the RAID grouping directly in degraded mode.

Assignees

Inventors

Classifications

  • Techniques of failing over between control units · CPC title

  • where memory access, memory control or I/O control functionality is redundant (redundant communication control functionality G06F11/2005; redundant storage control functionality G06F11/2089) · CPC title

  • Parity data used in redundant arrays of independent storages, e.g. in RAID systems · CPC title

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

  • in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10402261B2 cover?
An example device in accordance with an aspect of the present disclosure includes a redundancy controller and/or memory module to prevent data corruption and single point of failure in a fault-tolerant memory fabric. Devices include engines to issue and/or respond to primitive requests, identify failures and/or fault conditions, and receive and/or issue containment mode indications.
Who is the assignee on this patent?
Hewlett Packard Entpr Develpment Lp, Hewlett Packard Entpr Dev Lp
What technology area does this patent fall under?
Primary CPC classification G06F11/2017. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).