Root cause identification of a problem in a distributed computing system using log files

US2021191798A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021191798-A1
Application numberUS-201916718707-A
CountryUS
Kind codeA1
Filing dateDec 18, 2019
Priority dateDec 18, 2019
Publication dateJun 24, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Automated methods and systems described directed to determining a root cause of problem with a system executing in a distributed computing system. Methods and systems train a normal-state model that characterizes a normal state of the system based on normal log files generated by event sources of the system executed under normal or test conditions. Methods and systems use the normal-state model and a log file containing log messages recorded about the time when a problem with the system has been detected to identify log messages that describe a root cause of the problem.

First claim

Opening claim text (preview).

1 . A method stored in one or more data-storage devices and executed using one or more processors of a computer system for determining a root cause of a problem with execution of a tenant's system in a distributed computing system, the method comprising: determining a normal-state model based on relevant tokens recorded in log messages of normal log files associated with the tenant's system; determining relevant term frequencies of relevant tokens of problem-related log messages of a problem log file associated with the tenant's system; determining a message score for each problem-related log message of the problem log file based on the normal-state model and the relevant term frequencies; and rank ordering the problem-related log messages based on the message scores, wherein highest ranked problem-related log messages potentially describe the root cause of the problem with execution of the tenant's system. 2 . The method of claim 1 wherein determining the normal-state model based on the relevant tokens recorded in the log messages of the normal log files comprises: extracting relevant tokens from log messages of each normal log file; for each relevant token computing a number of normal log files that contain the relevant token, and computing an inverse document frequency value of the relevant token based on the number of normal log files that contain the relevant token; and computing a normalized inverse document frequency value for each relevant token based on the inverse document frequency values. 3 . The method of claim 1 wherein determining the relevant term frequencies of the relevant tokens of the problem-related log messages of the problem log file comprises: identifying problem-related log messages of the problem log file; extracting relevant tokens from the problem-related log messages of the problem log file. computing a frequency for each relevant token; determining a maximum frequency of the frequencies; and computing a relevant term frequency value for each relevant token. 4 . The method of claim 1 wherein determining the message score for each problem-related log message of the problem log file comprises: assigning consecutive line numbers to each log message of the problem log file beginning with the log message having an oldest time stamp and ending with a most recent log message recorded in the problem log file; and for each problem-related log message of the problem log file aggregating relevant term frequency-inverse domain frequency values of relevant tokens, and computing a message score based on the aggregated relevant term frequency-inverse domain frequency values. 5 . The method of claim 1 wherein determining the message score for each problem-related log message of the problem log file comprises: assigning consecutive line numbers to each log message of the problem log file beginning with the log message having an oldest time stamp and ending with a most recent log message recorded in the problem log file; determining a time stamp of a problem-related log message of the problem log file located closest to a suspected time when the problem occured; and for each problem-related log message of the problem log file aggregating relevant term frequency-inverse domain frequency values of relevant tokens, computing a message weight based the line number of the problem-related log message and the time stamp, and computing a message score based on the aggregated relevant term frequency-inverse domain frequency value. 6 . The method of claim 1 further comprising displaying the problem-related log messages of the problem log file in a graphical-user interface with highest ranked problem-related log messages identified as describing a potential root cause of the problem. 7 . A computer system determining a root cause of a problem with execution of a tenant's system in a distributed computing system, the system comprising: one or more processors; one or more data-storage devices; and machine-readable instructions stored in the one or more data-storage devices that when executed using the one or more processors controls the system to perform the operations comprising: determining a normal-state model based on relevant tokens recorded in log messages of normal log files associated with the tenant's system; determining relevant term frequencies of relevant tokens of problem-related log messages of a problem log file associated with the tenant's system; determining a message score for each problem-related log message of the problem log file based on the normal-state model and the relevant term frequencies; and rank ordering the problem-related log messages based on the message scores, wherein highest ranked problem-related log messages potentially describe the root cause of the problem with execution of the tenant's system. 8 . The computer system of claim 7 wherein determining the normal-state model based on the relevant tokens recorded in the log messages of the normal log files comprises: extracting relevant tokens from log messages of each normal log file; for each relevant token computing a number of normal log files that contain the relevant token, and computing an inverse document frequency value of the relevant token based on the number of normal log files that contain the relevant token; and computing normalized inverse document frequency values for each relevant token based on the inverse document frequency values. 9 . The computer system of claim 7 wherein determining the relevant term frequencies of the relevant tokens of the problem-related log messages of the problem log file comprises: identifying problem-related log messages of the problem log file; extracting relevant tokens from the problem-related log messages of the problem log file. computing a frequency for each relevant token; determining a maximum frequency of the frequencies; and computing a relevant term frequency value for each relevant token. 10 . The computer system of claim 7 wherein determining the message score for each problem-related log message of the problem log file comprises: assigning consecutive line numbers to each log message of the problem log file beginning with the log message having an oldest time stamp and ending with a most recent log message recorded in the problem log file; and for each problem-related log message of the problem log file aggregating relevant term frequency-inverse domain frequency values of relevant tokens, and computing a message score based on the aggregated relevant term frequency-inverse domain frequency values. 11 . The computer system of claim 7 wherein determining the message score for each problem-related log message of the problem log file comprises: assigning consecutive line numbers to each log message of the problem log file beginning with the log message have an oldest time stamp and ending with a most recent log message recorded in the problem log file; determining a time stamp of a problem-related log message of the problem log file located closest to a suspected time when the problem occurred; and for each problem-related log message of the problem log file aggregating relevant term frequency-inverse domain frequency values of relevant tokens, computing a message weight based the line number of the problem-related log message and the time stamp, and computing a message score based on the aggregated relevant term frequency-inverse domain frequency value. 12 . The computer system of claim 7 further comprising displaying the problem-related log messages of the problem log file in a graphical-user interface with highest ranked problem rela

Assignees

Inventors

Classifications

  • in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title

  • G06F11/079Primary

    Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • Dumping, i.e. gathering error/state information after a fault for later diagnosis · CPC title

  • Data logging (G06F11/14, G06F11/2205 take precedence) · CPC title

  • Content or structure details of the error report, e.g. specific table structure, specific error fields · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021191798A1 cover?
Automated methods and systems described directed to determining a root cause of problem with a system executing in a distributed computing system. Methods and systems train a normal-state model that characterizes a normal state of the system based on normal log files generated by event sources of the system executed under normal or test conditions. Methods and systems use the normal-state model…
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/0709. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 24 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).