Methods, apparatus and system for notification of predictable memory failure

US9535774B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9535774-B2
Application numberUS-201314022180-A
CountryUS
Kind codeB2
Filing dateSep 9, 2013
Priority dateSep 9, 2013
Publication dateJan 3, 2017
Grant dateJan 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for providing notification of a predictable memory failure includes the steps of: obtaining information regarding at least one condition associated with a memory; calculating a memory failure probability as a function of the obtained information; calculating a failure probability threshold; and generating a signal when the memory failure probability exceeds the failure probability threshold, the signal being indicative of a predicted future memory failure.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for providing notification of a predictable memory failure, the method comprising: obtaining information regarding at least one condition associated with a memory; calculating a memory failure probability as a function of the obtained information, the memory failure probability being defined as a probability of an occurrence of an uncorrectable error in the memory; calculating a statistically-defined failure probability threshold as a function of at least one memory failure model and user-controllable notification settings, the at least one memory failure model being matched to at least one of a plurality of memory failure types; generating a signal when the memory failure probability exceeds the failure threshold, the signal being indicative of a predicted future memory failure. 2. The method of claim 1 , wherein obtaining the information regarding at least one condition associated with the memory further comprises obtaining at least one of access pattern, power, thermal and aging data corresponding to the memory. 3. The method of claim 1 , further comprising obtaining the information regarding at least one condition associated with the memory in real-time. 4. The method of claim 1 , further comprising indicating a failure type, wherein the failure type is at least one of electromigration, negative bias temperature instability, positive bias temperature instability, temperature-dependent dielectric breakdown, and hot carrier injection. 5. The method of claim 1 , further comprising defining the user-controllable notification settings by one of: (i) setting a prediction accuracy given a prescribed action time window before an unrecoverable error; and (ii) setting an action time window before an unrecoverable error given a prescribed prediction accuracy. 6. The method of claim 1 , wherein calculating the memory failure probability further comprises evaluating a failure model using the user-controllable notification settings. 7. The method of claim 1 , wherein the generated signal is hardware-independent. 8. The method of claim 1 , wherein calculation of the memory failure probability further comprises applying a failure model to at least one of a prescribed prediction accuracy and an action time window. 9. The method of claim 1 , wherein calculating the memory failure probability further comprises determining a fault prediction function relating at least a subset of the obtained data to a probability of a memory failure occurring in a prescribed memory area as a function of time. 10. The method of claim 9 , wherein the fault prediction function is a hardware-specific function that correlates a number of memory access read and write operations and the at least one condition associated with the memory as a function of time. 11. The method of claim 1 , further comprising monitoring a corrected error rate, the memory failure probability being calculated in response to a variation in the corrected error rate. 12. The method of claim 1 , wherein the information regarding at least one condition associated with the memory is obtained for one or more segments of the memory, and the memory failure probability is calculated for each of the one or more segments of the memory. 13. The method of claim 1 , wherein the signal indicative of a predicted future memory failure comprises one or more parameters indicating at least one of a memory portion that is about to fail, an expected action time window before failure, and an expected uncorrectable memory error rate after failure occurs. 14. The method of claim 1 , further comprising providing a system, wherein the system comprises distinct software modules, each of the distinct software modules being embodied on a computer-readable storage medium, and wherein the distinct software modules comprise a health tracking module and a notification settings module, and wherein the signal indicative of a predicted future memory failure is generated, at least in part, by said health tracking module executing on at least one hardware processor, and the failure probability threshold is calculated as a function of at least one of a prediction accuracy and an action time window supplied by said notification settings module executing on the at least one hardware processor. 15. An apparatus, comprising: a memory; at least one sensor coupled with the memory and operative to obtain information regarding at least one condition associated with the memory; and at least one processor coupled with the memory and operative: to calculate a memory failure probability as a function of the obtained information, the memory failure probability being defined as a probability of an occurrence of an uncorrectable error in the memory; to calculate a statistically-defined memory failure probability threshold as a function of at least one memory failure model and user-controllable notification settings, the at least one memory failure model being matched to at least one of a plurality of memory failure types; and to generate a signal when the memory failure probability exceeds the failure probability threshold, the signal being indicative of a predicted future memory failure. 16. The apparatus of claim 15 , wherein the at least one sensor monitors at least one of memory performance patterns, memory power, thermal variations of the memory and aging variations of the memory. 17. The apparatus of claim 15 , wherein the at least one processor is further operative to calculate a fault prediction function for correlating the obtained information regarding at least one condition associated with the memory to the memory failure probability. 18. The apparatus of claim 15 , further comprising a plurality of distinct software modules, each of the software modules being embodied on a computer-readable storage medium, the distinct software modules comprising a memory health tracking module and a notification settings module, wherein the at least one processor is operative: to generate notification of a deterioration in health of the memory by executing the memory health tracking module; and to set at least one of a prediction accuracy and an action time window by executing the notification settings module, the memory failure probability being a function of at least one of the prediction accuracy and the action time window. 19. A computer program product, comprising a non-transitory computer readable storage medium having computer readable program code embodied therewith, said computer readable program code comprising: computer readable program code configured: to obtain information regarding at least one condition associated with a memory; to calculate a memory failure probability as a function of the obtained information, the memory failure probability being defined as a probability of an occurrence of an uncorrectable error in the memory; to calculate a statistically-defined failure probability threshold as a function of at least one memory failure model and user-controllable notification settings, the at least one memory failure model being matched to at least one of a plurality of memory failure types; to generate a signal when the memory failure probability exceeds the failure probability threshold, the signal being indicative of a predicted future memory failure. 20. The method of claim 1 , further comprising calibrating the at least one memory failure model with in-field empirical measurements to thereby adjust for at least one of process variation in and dynamic usage of hardware components in t

Assignees

Inventors

Classifications

  • Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations (thermal management in cooling arrangements of a computing system G06F1/206) · CPC title

  • where the computing system component is a memory, e.g. virtual memory, cache (accessing, addressing or allocating within memory systems or architectures G06F12/00; checking stores for correct operation G11C29/00) · CPC title

  • G06F11/008Primary

    Reliability or availability analysis · CPC title

  • Indication or identification of errors, e.g. for repair · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9535774B2 cover?
A method for providing notification of a predictable memory failure includes the steps of: obtaining information regarding at least one condition associated with a memory; calculating a memory failure probability as a function of the obtained information; calculating a failure probability threshold; and generating a signal when the memory failure probability exceeds the failure probability thre…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).