Predicting hardware failures in a server
US-2015281015-A1 · Oct 1, 2015 · US
US9977075B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9977075-B1 |
| Application number | US-201615360899-A |
| Country | US |
| Kind code | B1 |
| Filing date | Nov 23, 2016 |
| Priority date | Nov 23, 2016 |
| Publication date | May 22, 2018 |
| Grant date | May 22, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments detailed herein include an apparatus that includes a reliability assessment engine (RAE) stored in non-volatile memory and processing circuitry to execute the RAE to: receive data of at least one physical condition from a plurality of intra-die variation monitoring circuits, apply the received data least one to at least one reliability physics model, and calculate at least one of an estimated amount of lifetime consumed and an estimated amount of lifetime remaining.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a reliability assessment engine (RAE) stored in non-volatile memory; and processing circuitry to execute the RAE to: receive data of at least one physical condition from a plurality of intra-die variation monitoring circuits; apply the received data to at least one reliability physics model, wherein the reliability physics model includes a time dependent dielectric breakdown model, a bias temperature stability model, an electromigration model, a negative/positive bias temperature model, an integrated reliability model, an edge damage monitor, a statistical model, and a fin-self-heat model; calculate at least one of an estimated amount of lifetime consumed and an estimated amount of lifetime remaining based on the reliability physics model output; and adjust an operation parameter based at least in part on the calculated amount of lifetime remaining. 2. The apparatus of claim 1 , wherein the statistical model comprises a Markov failure prediction model. 3. The apparatus of claim 1 , wherein the data of at least one physical condition sensed during a period of operation includes one or more sensed voltages, average of the one or more sensed voltages, one or more sensed temperatures, average of the one or more sensed temperatures, one or more workload measures, or average of the one or more workload measures. 4. The apparatus of claim 1 , wherein the RAE is part of power control circuitry. 5. The apparatus of claim 1 , wherein the RAE is to receive an indication of a desired performance state and adjust an operation parameter of an integrated circuit based at least in part on the received indication such that at least one of an average voltage, average temperature, or average workload metric is within a desired range. 6. The apparatus of claim 1 , further comprising: a network interface communicatively coupled to the RAE to transmit and receive information to other RAEs. 7. The apparatus of claim 1 , wherein the RAE is to select a reliability physics model from the plurality of reliability physics models. 8. A method comprising: receiving data representing at least one physical condition of an integrated circuit; calculating an estimated amount of lifetime remaining of the integrated circuit using at least one on-die reliability physics model, wherein the reliability physics model includes a time dependent dielectric breakdown model, a bias temperature stability model, an electromigration model, a negative/positive bias temperature model, an integrated reliability model, an edge damage monitor, a statistical model, and a fin-self-heat model; and adjusting an operating parameter of the integrated circuit based on the calculated estimated amount of lifetime remaining of the integrated circuit. 9. The method of claim 8 , wherein the statistical model comprises a Markov failure prediction model. 10. The method of claim 8 , wherein the data representing at least one physical condition includes one or more sensed voltages, average of the one or more sensed voltages, one or more sensed temperatures, average of the one or more sense temperatures, one or more workload measures, or average of the one or more workload measures. 11. The method of claim 8 , further comprising: updating at least one of the on-die reliability physics models. 12. The method of claim 8 , wherein the operating parameter is adjusted such that at least one of an average voltage, average temperature, or average workload metric is within a desired range. 13. The method of claim 8 , further comprising: calculating an estimated amount of lifetime consumed for the integrated circuit. 14. A method comprising: receiving data representing at least one physical condition of an integrated circuit and a calculation related to a lifetime of the integrated circuit, wherein the calculation related to a lifetime of the integrated circuit is based on a reliability physics model and the reliability physics model is one or more of an edge damage monitor and a fin-self-heat model; determining a desired performance state for the integrated circuit; and transmitting information about the desired performance state to a die having the integrated circuit, whereby an operating parameter of the integrated circuit is adjusted to fit the desired performance state. 15. The method of claim 14 , the reliability physics model is one or more of a time dependent dielectric breakdown model, a bias temperature stability model, an electromigration model, a negative/positive bias temperature model, an integrated reliability model, a statistical model. 16. The method of claim 14 , wherein the data representing at least one physical condition includes one or more sensed voltages, average of the one or more sensed voltages, one or more sensed temperatures, average of the one or more sense temperatures, one or more workload measures, or average of the one or more workload measures.
Design verification, e.g. using simulation, simulation program with integrated circuit emphasis [SPICE], direct methods or relaxation methods · CPC title
Measuring of material aspects, e.g. electro-migration [EM], hot carrier injection · CPC title
Environmental, reliability or burn-in testing · CPC title
Procedures; Software aspects · CPC title
Testing of integrated circuits [IC] (G01R31/317 takes precedence; testing individual devices G01R31/26; testing printed circuits G01R31/2801) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.