Technologies for monitoring health of a process on a compute device

US10592383B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10592383-B2
Application numberUS-201715637706-A
CountryUS
Kind codeB2
Filing dateJun 29, 2017
Priority dateJun 29, 2017
Publication dateMar 17, 2020
Grant dateMar 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for monitoring health of processes includes a compute device having a performance monitoring parameter manager and an analytics engine. The compute device accesses performance monitoring parameters associated with a monitored process of the compute device. The compute device samples one or more hardware counters associated with the monitored process and applies a performance monitor filter to the sampled one or more hardware counters to generate hardware counter values. The compute device performs a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters.

First claim

Opening claim text (preview).

The invention claimed is: 1. A compute device for monitoring health of processes, the compute device comprising: one or more hardware counters; a processor; a memory having stored thereon a plurality of instructions that, when executed, causes the compute device to: determine performance monitoring parameters prior to execution of a process of the compute device that is to be monitored, wherein each of the performance monitoring parameters is usable to monitor the process of the compute device and each of the performance monitoring parameters is associated with the particular process to be monitored; sample the one or more hardware counters to generate hardware counter values associated with the monitored process, wherein the hardware counter values indicate a cache misses/access ratio of the monitored process and uncore activity of the monitored process; and perform a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters, wherein to perform the process fault check comprises to perform an infinite loop fault check on the monitored process, wherein to perform the infinite loop fault check comprises to: determine whether the cache misses/accesses ratio of the monitored process is out of a first range indicated by the performance monitoring parameters; and determine whether the uncore activity of the monitored process is out of a second range indicated by the performance monitoring parameters. 2. The compute device of claim 1 , wherein to determine the performance monitoring parameters comprises to: execute the process during a training period; sample the one or more hardware counters to generate training hardware counter values associated with the training period of the monitored process, wherein the training hardware counter values are indicative of the normal operation of the monitored process; and automatically determine values of the performance monitoring parameters based on the training hardware counter values, wherein to perform the process fault check comprises to perform the process fault check during an execution period different from the training period. 3. The compute device of claim 1 , wherein the hardware counter values indicate a memory bandwidth of the monitored process and cache misses of the monitored process, wherein to perform the process fault check comprises to perform a negative interplay fault check on the monitored process, wherein to perform the negative interplay fault check comprises to perform the negative interplay fault check based on the memory bandwidth of the monitored process and the cache misses of the monitored process. 4. The compute device of claim 1 , wherein the hardware counter values indicate an instructions per clock cycle of the monitored process, wherein to perform the process fault check comprises to perform a performance impact check on the monitored process, wherein to perform the performance impact check comprises to determine whether the instructions per clock cycle of the monitored process is out of a third range indicated by the performance monitoring parameters. 5. The compute device of claim 1 , wherein the plurality of instructions further causes the compute device to generate a report in response to a detection of a process fault based on the hardware counter values and the performance monitoring parameters. 6. The compute device of claim 5 , wherein to generate a report in response to the detection of the process fault comprises to generate a report in response to detection of a number of process faults exceeding a threshold number of process faults. 7. A method for monitoring health of processes, the method comprising: determining, by a compute device, performance monitoring parameters prior to execution of a process of the compute device that is to be monitored, wherein each of the performance monitoring parameters is usable to monitor the process of the compute device and each of the performance monitoring parameters is associated with indicative of a normal operation of the particular process to be monitored; sampling, by the compute device, one or more hardware counters of the compute device to generate hardware counter values associated with the monitored process, wherein the hardware counter values indicate a cache misses/access ratio of the monitored process and uncore activity of the monitored process; and performing, by the compute device, a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters, wherein performing the process fault check comprises performing an infinite loop fault check on the monitored process, wherein performing the infinite loop fault check further comprises determining whether the cache misses/accesses ratio of the monitored process is out of a first range indicated by the performance monitoring parameters and determining whether the uncore activity of the monitored process is out of a second range indicated by the performance monitoring parameters. 8. The method of claim 7 , wherein determining the performance monitoring parameters comprises: executing the process during a training period; sampling the one or more hardware counters to generate training hardware counter values associated with the training period of the monitored process, wherein the training hardware counter values are indicative of the normal operation of the monitored process; and automatically determining values of the performance monitoring parameters based on the training hardware counter values, wherein performing the process fault check comprises performing the process fault check during an execution period different from the training period. 9. The method of claim 7 , wherein the hardware counter values indicate a memory bandwidth of the monitored process and cache misses of the monitored process, wherein performing the process fault check comprises performing a negative interplay fault check on the monitored process, wherein performing the negative interplay fault check comprises performing the negative interplay fault check based on the memory bandwidth of the monitored process and the cache misses of the monitored process. 10. The method of claim 7 , further comprising generating a report in response to a detection of a process fault based on the hardware counter values and the performance monitoring parameters. 11. The method of claim 10 , wherein generating a report in response to the detection of the process fault comprises to generate a report in response to detection of a number of process faults exceeding a threshold number of process faults. 12. One or more non-transitory computer-readable media comprising a plurality of instructions stored thereon that, when executed, causes a compute device to: determine performance monitoring parameters prior to execution of a process of the compute device that is to be monitored, wherein each of the performance monitoring parameters is usable to monitor the process of the compute device and each of the performance monitoring parameters is associated with indicative of a normal operation of the particular process to be monitored; sample one or more hardware counters of the compute device to generate hardware counter values associated with the monitored process, wherein the hardware counter values indicate a cache misses/access ratio of the monitored process and uncore activity of the monitored process; and perform a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters, wherein to perform the process fault check comprises to perform an infinite

Assignees

Inventors

Classifications

  • for performance assessment · CPC title

  • where the computing system is implementing multitasking (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title

  • Circuit details, i.e. tracer hardware · CPC title

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • for systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10592383B2 cover?
A method for monitoring health of processes includes a compute device having a performance monitoring parameter manager and an analytics engine. The compute device accesses performance monitoring parameters associated with a monitored process of the compute device. The compute device samples one or more hardware counters associated with the monitored process and applies a performance monitor fi…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/3495. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).