Troubleshooting based on log similarity

US9542255B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9542255-B2
Application numberUS-201414489004-A
CountryUS
Kind codeB2
Filing dateSep 17, 2014
Priority dateSep 23, 2013
Publication dateJan 10, 2017
Grant dateJan 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to a method and apparatus for troubleshooting based on log similarity. In one embodiment, there is provided a method for troubleshooting based on log similarity, comprising: extracting log patterns from multiple log files in response to having collected the multiple log files from at least one system with troubles, the log pattern describing a regular expression to which a log message in a log file among the multiple log files conforms; building a pattern repository using the log patterns; mapping each of the multiple log files to an n-dimensional vector based on the pattern repository; and clustering multiple n-dimensional vectors to which each of the multiple log files is mapped into at least one group, wherein each of the at least one group indicates one trouble type of the at least one system. In another embodiment, there is provided a corresponding apparatus.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for trouble shooting based on log similarity, the method comprising: extracting log patterns from a plurality of log files in response to having collected the plurality of log files from at least one system with troubles, each log pattern describing a regular expression to which a log message in a log file among the plurality of log files conforms; building a pattern repository using the log patterns extracted; mapping each of the plurality of log files to an n-dimensional vector based on the pattern repository; and clustering a plurality of n-dimensional vectors to which each of the plurality of log files is mapped into at least one group, wherein each of the at least one group indicates one trouble type of the at least one system; wherein the mapping each of the plurality of log files to an n-dimensional vector based on the pattern repository comprises: with respect to a log file j among the plurality of log files, matching each line of log message k in the log file j to a corresponding log pattern p k in the pattern repository; transforming the log file j into a sequence f j of the corresponding log pattern p k ; and mapping the sequence f j to an n-dimensional vector; wherein dimension n of the n-dimensional vector is proportional to the amount of log patterns in the pattern repository; wherein mapping the sequence f j to an n-dimensional vector comprises: with respect to a log pattern p i in the pattern repository, calculating an eigenvalue tfidf i,j , , wherein tfidf i,j =tf i,j ×idf i ; the eigenvalue tfidf i,j associated with the sequence f j and the log pattern p i , and the eigenvalue tfidf i,j associated with an occurrence frequency of the log pattern p i in a plurality of sequences corresponding to the plurality of log files; and treating the eigenvalue tfidf i,j as the i th component in the n-dimensional vector to build the n-dimensional vector. 2. The method according to claim 1 , wherein the extracting log patterns from plurality of log files in response to having collected the plurality log files from the at least one system with troubles comprises: with respect to a current log file among the plurality of log files, extracting the log patterns from the log messages in the current log file. 3. The method according to claim 2 , wherein the extracting the log patterns from the log messages in the current log file comprises: calculating the longest common subsequence of the log messages to extract the log patterns. 4. The method according to claim 1 , wherein the eigenvalue tfidf i,j is associated with the term frequency tf i,j of the log pattern p i and with the inverse document frequency idf i of the log pattern p i . 5. The method according to claim 4 , wherein tf i , j = t i , j ∑ u ⁢ t u , j , where t i,j represents an occurrence number of the log pattern p i in the sequence f j ; Σ u t u,j represents a sum of occurrence numbers of all log patterns in the pattern repository in the sequence f j ; and idf i = log ⁢  F  1 + { v ⁢ : ⁢ ⁢ p i ∈ f v } , where |F| represents the amount of the plurality of log files, and {v:p i εf v } represents the amount of sequences comprising the log pattern p i . 6. The method according to claim 1 , wherein the building a pattern repository using the log patterns comprises: in response to an occurrence frequency of a log pattern p among the log patterns exceeding a predefined threshold, adding the log pattern p into the pattern repository. 7. The method according to claim 6 , further comprising calculating similarity between a new log file and the plurality of vectors in at least one group in response to having received the new log file from a system; treating a failure type indicated by a group to which a vector with the highest similarity belongs as a failure type of the system; mapping the new log file to the n-dimensional vector; and calculating the similarity between the n-dimensional vector and the plurality of vectors. 8. An apparatus for trouble shooting based on log similarity, comprising: a processor; memory in communication with the processor; a log module, via the processor and memory, extracting log patterns from a plurality of log files in response to having collected the plurality of log files from at least one system with troubles, each log pattern describing a regular expression to which a log message in a log file among the plurality of log files conforms; a building module, via the processor and memory, building a pattern repository using the log patterns extracted; a mapping module, via the processor and memory, mapping each of the plurality of log files to an n-dimensional vector based on the pattern repository; and a clustering module, via the processor and memory, clustering a plurality of n-dimensional vectors to which each of the plurality of log files is mapped into at least one group, wherein each of the at least one group indicates one trouble type of the at least one system; wherein the mapping module is configured to, with respect to a log file j among the plurality of log files, match each line of log message k in the log file j to a corresponding log pattern p k in the pattern repository; transform the log file j into a sequence f j of the corresponding log pattern p k ; and map the sequence f j to an n-dimensional vector; wherein dimension n of the n-dimensional vector is proportional to the amount of log patterns in the pattern repository; wherein the mapping module is configured to, with respect to a log pa

Assignees

Inventors

Classifications

  • in a multiprocessor or a multi-core unit (multiprocessors per se G06F15/80) · CPC title

  • G06F11/079Primary

    Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • Sequencing of tasks or work · CPC title

  • Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9542255B2 cover?
The present disclosure relates to a method and apparatus for troubleshooting based on log similarity. In one embodiment, there is provided a method for troubleshooting based on log similarity, comprising: extracting log patterns from multiple log files in response to having collected the multiple log files from at least one system with troubles, the log pattern describing a regular expression t…
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/079. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).