Determining abnormal conditions of host state from log files through Markov modeling

US10255124B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10255124-B1
Application numberUS-201313924013-A
CountryUS
Kind codeB1
Filing dateJun 21, 2013
Priority dateJun 21, 2013
Publication dateApr 9, 2019
Grant dateApr 9, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for determining whether a computing node is in a normal or an abnormal condition based on its characteristics relative to those of other computing nodes. In embodiments, log files for the computing node are used to develop a state model of the computing node, and where the state model differs between two similar computing nodes, an abnormality is identified. In other embodiments, characteristics about computing nodes (e.g., CPU resources used) are used to cluster those computing nodes, and those computing nodes that lie outside of a cluster are identified as abnormal.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for determining whether a first computing node is in an abnormal condition, comprising: a plurality of computing nodes comprising the first computing node; at least one memory bearing instructions that, upon execution, cause the system at least to: receive information about at least a subset of the plurality of computing nodes that is stored in at least one log file, the information identifying a first action and a second action of the subset of the plurality of computing nodes, and an order in which the first and second actions occur; determine a first state for a state model based on the first action identified in the at least one log file; determine a second state for the state model based on the second action identified in the at least one log file; determine a transition from the first state to the second state in the state model based on the order in which the first and second actions occur in the at least one log file; determine a second state model for a second computing node of the plurality of computing nodes based on information about the second computing node that is stored in a second log file; determine that the second state model and the state model differ by at least a predetermined amount; and determine that the second computing node is in an abnormal condition in response to determining that the second state model and the state model differ by at least the predetermined amount. 2. The system of claim 1 , wherein the at least one memory further bears instructions that, upon execution, cause the system at least to: receive additional information about the first computing node that is stored in the log file or another log file; determine a new state model for the first computing node; determine that the new state model and the state model differ by at least a predetermined amount; and determine to replace the state model with the new state model as representing the first computing node in response to determining that the new state model and the state model differ by at least the predetermined amount. 3. A computer-implemented method for determining whether a computing service is in an abnormal condition, comprising: logging information relating to operations of a computing service in a database; in response to sending a request to the database, receiving an identification of a first action and an identification of a second action of the computing service from the database, and an identification of an order in which the first and second actions occur from the database; determining a first state for a state model based on the first action; adding a determined second state for the state model based on the second action; computing a transition from the first state to the second state in the state model based on the order in which the first and second actions occur; determining a second state model for the computing service based on an identification of additional actions of the computing service or based on another computing service; determining that the second state model and the state model differ by at least a predetermined amount; and wherein determining that the computing service or the another computing service is in the abnormal condition is in response to determining that the second state model and the state model differ by at least the predetermined amount. 4. The method of claim 3 , further comprising: determining to use the second state model rather than the state model as representing the computing service in response to determining that the new state model and the state model differ by at least the predetermined amount. 5. The method of claim 3 , further comprising: receiving an indication of a predetermined state of the state model; and determining a third state of the state model based on the indication of the predetermined state. 6. The method of claim 3 , further comprising: receiving an indication of a predetermined state of the state model; determining that the predetermined state and the first state each represent the first action of the computing service; and determining to use one of the predetermined state and the first state in response to determining that the predetermined state and the first state each represent the first action of the computing service. 7. The method of claim 3 , further comprising: receiving an indication of a first time at which the computing service performed the first action, and an indication of a second time at which the computing service performed the second action; and determining an expected time spent in the first state for the state model based on a difference between the first time and the second time. 8. The method of claim 3 , wherein the identification of the first action is stored on the computing service and the identification of the second action is stored on a second computing service. 9. The method of claim 8 , wherein the identification of the first action is stored in a first log file on the computing service, wherein the identification of the second action is stored in a second log file on the second computing service, and wherein determining the first state for the state model based on the first action further comprises: joining the first log file and the second log file into a table. 10. The method of claim 3 , wherein the transition identifies a probability that the computing service will perform the second action immediately following performing the first action. 11. The method of claim 3 , wherein the state model comprises a hidden Markov model. 12. A non-transitory computer-readable medium, bearing computer-readable instructions that, when executed on a computing service, cause the at least one computing service to perform operations comprising: receiving an identification of first action and an identification of a second action of the computing service, and an identification of an order in which the first and second actions occur; determining a first state for a state model based on the first action; determining a second state for the state model based on the second action; and determining a transition from the first state to the second state in the state model based on the order in which the first and second actions occur; determining a second state model for the computing service based on an identification of additional actions of the computing service or based on another computing service; determining that the second state model and the state model differ by at least a predetermined amount; and determining that the computing service or the another computing service is in an abnormal condition in response to determining that the second state model and the state model differ by at least the predetermined amount. 13. The non-transitory computer-readable medium of claim 12 , further bearing computer-readable instructions that, upon execution on the computing service, cause the computing service to perform operations comprising: determining to use the second state model rather than the state model as representing the computing service in response to determining that the new state model and the state model differ by at least the predetermined amount. 14. The non-transitory computer-readable medium of claim 12 , further bearing computer-readable instructions that, upon execution on the computing service, cause the computing service to perform operations comprising: receiving an indication of a predetermined state of the state model; and determining a third state of the state model based on the indication of the predetermined state. 15. The non-trans

Assignees

Inventors

Classifications

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

  • in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title

  • Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level · CPC title

  • Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10255124B1 cover?
Embodiments are disclosed for determining whether a computing node is in a normal or an abnormal condition based on its characteristics relative to those of other computing nodes. In embodiments, log files for the computing node are used to develop a state model of the computing node, and where the state model differs between two similar computing nodes, an abnormality is identified. In other e…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/0751. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 09 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).