Identifying alarms for a root cause of a problem in a data processing system
US-2015280968-A1 · Oct 1, 2015 · US
US9542255B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9542255-B2 |
| Application number | US-201414489004-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 17, 2014 |
| Priority date | Sep 23, 2013 |
| Publication date | Jan 10, 2017 |
| Grant date | Jan 10, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to a method and apparatus for troubleshooting based on log similarity. In one embodiment, there is provided a method for troubleshooting based on log similarity, comprising: extracting log patterns from multiple log files in response to having collected the multiple log files from at least one system with troubles, the log pattern describing a regular expression to which a log message in a log file among the multiple log files conforms; building a pattern repository using the log patterns; mapping each of the multiple log files to an n-dimensional vector based on the pattern repository; and clustering multiple n-dimensional vectors to which each of the multiple log files is mapped into at least one group, wherein each of the at least one group indicates one trouble type of the at least one system. In another embodiment, there is provided a corresponding apparatus.
Opening claim text (preview).
What is claimed is: 1. A method for trouble shooting based on log similarity, the method comprising: extracting log patterns from a plurality of log files in response to having collected the plurality of log files from at least one system with troubles, each log pattern describing a regular expression to which a log message in a log file among the plurality of log files conforms; building a pattern repository using the log patterns extracted; mapping each of the plurality of log files to an n-dimensional vector based on the pattern repository; and clustering a plurality of n-dimensional vectors to which each of the plurality of log files is mapped into at least one group, wherein each of the at least one group indicates one trouble type of the at least one system; wherein the mapping each of the plurality of log files to an n-dimensional vector based on the pattern repository comprises: with respect to a log file j among the plurality of log files, matching each line of log message k in the log file j to a corresponding log pattern p k in the pattern repository; transforming the log file j into a sequence f j of the corresponding log pattern p k ; and mapping the sequence f j to an n-dimensional vector; wherein dimension n of the n-dimensional vector is proportional to the amount of log patterns in the pattern repository; wherein mapping the sequence f j to an n-dimensional vector comprises: with respect to a log pattern p i in the pattern repository, calculating an eigenvalue tfidf i,j , , wherein tfidf i,j =tf i,j ×idf i ; the eigenvalue tfidf i,j associated with the sequence f j and the log pattern p i , and the eigenvalue tfidf i,j associated with an occurrence frequency of the log pattern p i in a plurality of sequences corresponding to the plurality of log files; and treating the eigenvalue tfidf i,j as the i th component in the n-dimensional vector to build the n-dimensional vector. 2. The method according to claim 1 , wherein the extracting log patterns from plurality of log files in response to having collected the plurality log files from the at least one system with troubles comprises: with respect to a current log file among the plurality of log files, extracting the log patterns from the log messages in the current log file. 3. The method according to claim 2 , wherein the extracting the log patterns from the log messages in the current log file comprises: calculating the longest common subsequence of the log messages to extract the log patterns. 4. The method according to claim 1 , wherein the eigenvalue tfidf i,j is associated with the term frequency tf i,j of the log pattern p i and with the inverse document frequency idf i of the log pattern p i . 5. The method according to claim 4 , wherein tf i , j = t i , j ∑ u t u , j , where t i,j represents an occurrence number of the log pattern p i in the sequence f j ; Σ u t u,j represents a sum of occurrence numbers of all log patterns in the pattern repository in the sequence f j ; and idf i = log F 1 + { v : p i ∈ f v } , where |F| represents the amount of the plurality of log files, and {v:p i εf v } represents the amount of sequences comprising the log pattern p i . 6. The method according to claim 1 , wherein the building a pattern repository using the log patterns comprises: in response to an occurrence frequency of a log pattern p among the log patterns exceeding a predefined threshold, adding the log pattern p into the pattern repository. 7. The method according to claim 6 , further comprising calculating similarity between a new log file and the plurality of vectors in at least one group in response to having received the new log file from a system; treating a failure type indicated by a group to which a vector with the highest similarity belongs as a failure type of the system; mapping the new log file to the n-dimensional vector; and calculating the similarity between the n-dimensional vector and the plurality of vectors. 8. An apparatus for trouble shooting based on log similarity, comprising: a processor; memory in communication with the processor; a log module, via the processor and memory, extracting log patterns from a plurality of log files in response to having collected the plurality of log files from at least one system with troubles, each log pattern describing a regular expression to which a log message in a log file among the plurality of log files conforms; a building module, via the processor and memory, building a pattern repository using the log patterns extracted; a mapping module, via the processor and memory, mapping each of the plurality of log files to an n-dimensional vector based on the pattern repository; and a clustering module, via the processor and memory, clustering a plurality of n-dimensional vectors to which each of the plurality of log files is mapped into at least one group, wherein each of the at least one group indicates one trouble type of the at least one system; wherein the mapping module is configured to, with respect to a log file j among the plurality of log files, match each line of log message k in the log file j to a corresponding log pattern p k in the pattern repository; transform the log file j into a sequence f j of the corresponding log pattern p k ; and map the sequence f j to an n-dimensional vector; wherein dimension n of the n-dimensional vector is proportional to the amount of log patterns in the pattern repository; wherein the mapping module is configured to, with respect to a log pa
in a multiprocessor or a multi-core unit (multiprocessors per se G06F15/80) · CPC title
Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title
Sequencing of tasks or work · CPC title
Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.