Cloud compute scheduling using a heuristic contention model
US-9871742-B2 · Jan 16, 2018 · US
US10592383B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10592383-B2 |
| Application number | US-201715637706-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2017 |
| Priority date | Jun 29, 2017 |
| Publication date | Mar 17, 2020 |
| Grant date | Mar 17, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for monitoring health of processes includes a compute device having a performance monitoring parameter manager and an analytics engine. The compute device accesses performance monitoring parameters associated with a monitored process of the compute device. The compute device samples one or more hardware counters associated with the monitored process and applies a performance monitor filter to the sampled one or more hardware counters to generate hardware counter values. The compute device performs a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters.
Opening claim text (preview).
The invention claimed is: 1. A compute device for monitoring health of processes, the compute device comprising: one or more hardware counters; a processor; a memory having stored thereon a plurality of instructions that, when executed, causes the compute device to: determine performance monitoring parameters prior to execution of a process of the compute device that is to be monitored, wherein each of the performance monitoring parameters is usable to monitor the process of the compute device and each of the performance monitoring parameters is associated with the particular process to be monitored; sample the one or more hardware counters to generate hardware counter values associated with the monitored process, wherein the hardware counter values indicate a cache misses/access ratio of the monitored process and uncore activity of the monitored process; and perform a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters, wherein to perform the process fault check comprises to perform an infinite loop fault check on the monitored process, wherein to perform the infinite loop fault check comprises to: determine whether the cache misses/accesses ratio of the monitored process is out of a first range indicated by the performance monitoring parameters; and determine whether the uncore activity of the monitored process is out of a second range indicated by the performance monitoring parameters. 2. The compute device of claim 1 , wherein to determine the performance monitoring parameters comprises to: execute the process during a training period; sample the one or more hardware counters to generate training hardware counter values associated with the training period of the monitored process, wherein the training hardware counter values are indicative of the normal operation of the monitored process; and automatically determine values of the performance monitoring parameters based on the training hardware counter values, wherein to perform the process fault check comprises to perform the process fault check during an execution period different from the training period. 3. The compute device of claim 1 , wherein the hardware counter values indicate a memory bandwidth of the monitored process and cache misses of the monitored process, wherein to perform the process fault check comprises to perform a negative interplay fault check on the monitored process, wherein to perform the negative interplay fault check comprises to perform the negative interplay fault check based on the memory bandwidth of the monitored process and the cache misses of the monitored process. 4. The compute device of claim 1 , wherein the hardware counter values indicate an instructions per clock cycle of the monitored process, wherein to perform the process fault check comprises to perform a performance impact check on the monitored process, wherein to perform the performance impact check comprises to determine whether the instructions per clock cycle of the monitored process is out of a third range indicated by the performance monitoring parameters. 5. The compute device of claim 1 , wherein the plurality of instructions further causes the compute device to generate a report in response to a detection of a process fault based on the hardware counter values and the performance monitoring parameters. 6. The compute device of claim 5 , wherein to generate a report in response to the detection of the process fault comprises to generate a report in response to detection of a number of process faults exceeding a threshold number of process faults. 7. A method for monitoring health of processes, the method comprising: determining, by a compute device, performance monitoring parameters prior to execution of a process of the compute device that is to be monitored, wherein each of the performance monitoring parameters is usable to monitor the process of the compute device and each of the performance monitoring parameters is associated with indicative of a normal operation of the particular process to be monitored; sampling, by the compute device, one or more hardware counters of the compute device to generate hardware counter values associated with the monitored process, wherein the hardware counter values indicate a cache misses/access ratio of the monitored process and uncore activity of the monitored process; and performing, by the compute device, a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters, wherein performing the process fault check comprises performing an infinite loop fault check on the monitored process, wherein performing the infinite loop fault check further comprises determining whether the cache misses/accesses ratio of the monitored process is out of a first range indicated by the performance monitoring parameters and determining whether the uncore activity of the monitored process is out of a second range indicated by the performance monitoring parameters. 8. The method of claim 7 , wherein determining the performance monitoring parameters comprises: executing the process during a training period; sampling the one or more hardware counters to generate training hardware counter values associated with the training period of the monitored process, wherein the training hardware counter values are indicative of the normal operation of the monitored process; and automatically determining values of the performance monitoring parameters based on the training hardware counter values, wherein performing the process fault check comprises performing the process fault check during an execution period different from the training period. 9. The method of claim 7 , wherein the hardware counter values indicate a memory bandwidth of the monitored process and cache misses of the monitored process, wherein performing the process fault check comprises performing a negative interplay fault check on the monitored process, wherein performing the negative interplay fault check comprises performing the negative interplay fault check based on the memory bandwidth of the monitored process and the cache misses of the monitored process. 10. The method of claim 7 , further comprising generating a report in response to a detection of a process fault based on the hardware counter values and the performance monitoring parameters. 11. The method of claim 10 , wherein generating a report in response to the detection of the process fault comprises to generate a report in response to detection of a number of process faults exceeding a threshold number of process faults. 12. One or more non-transitory computer-readable media comprising a plurality of instructions stored thereon that, when executed, causes a compute device to: determine performance monitoring parameters prior to execution of a process of the compute device that is to be monitored, wherein each of the performance monitoring parameters is usable to monitor the process of the compute device and each of the performance monitoring parameters is associated with indicative of a normal operation of the particular process to be monitored; sample one or more hardware counters of the compute device to generate hardware counter values associated with the monitored process, wherein the hardware counter values indicate a cache misses/access ratio of the monitored process and uncore activity of the monitored process; and perform a process fault check on the monitored process based on the hardware counter values and the performance monitoring parameters, wherein to perform the process fault check comprises to perform an infinite
for performance assessment · CPC title
where the computing system is implementing multitasking (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title
Circuit details, i.e. tracer hardware · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
for systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.