Integrated statistical log data mining for mean time auto-resolution

US10528407B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10528407-B2
Application numberUS-201715701468-A
CountryUS
Kind codeB2
Filing dateSep 12, 2017
Priority dateJul 20, 2017
Publication dateJan 7, 2020
Grant dateJan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method may include generating, by a diagnosis manager, a plurality of pre-processed files based on a plurality of log files containing operational information related to one or more of the plurality of modules operating in the cloud environment. The method may include generating a set of weightage matrices based on a plurality of tokens extracted from the plurality of pre-processed files, and identifying a plurality of clusters based on the set of weightage matrices. The method may further include determining, by a resolution manager coupled with the diagnosis manager, an operational issue for a specific module selected from the plurality of modules and associated with a specific cluster selected from the plurality of clusters, based on the subset of tokens associated with the specific cluster; and performing a predefined action on the specific module based on the operational issue.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for automatically diagnosing and resolving operational issues in a cloud environment, the method comprising: collecting, by a diagnosis manager, a plurality of log files generated by a plurality of modules operating in the cloud environment, wherein each of the plurality of log files contains operational information related to one or more of the plurality of modules; generating, by the diagnosis manager, a set of weightage matrices based on a plurality of tokens extracted from the plurality of log files; generating, by the diagnosis manager, a plurality of nodes corresponding to the plurality of modules, wherein each of the plurality of nodes is associated with one or more tokens selected from the plurality of tokens; identifying, by the diagnosis manager, a plurality of clusters from the plurality of nodes based on the set of weightage matrices, wherein each of the plurality of clusters includes a subset of nodes selected from the plurality of nodes and is associated with a representative keyword including one or more tokens that represent contents of the subset of nodes; and determining, by a resolution manager coupled with the diagnosis manager, an operational issue for a specific module selected from the plurality of modules and associated with a specific cluster selected from the plurality of clusters, based on the corresponding representative keyword associated with the specific cluster. 2. The method as recited in the claim 1 , wherein the method further comprises: performing, by the resolution manager, a predefined action on the specific module based on the operational issue. 3. The method as recited in the claim 1 , wherein the generating of the set of weightage matrices comprises: for a log file selected from the plurality of log files, identifying a plurality of words in the log file; extracting one or more tokens from the plurality of words after removing stop-words from and performing stemming on the plurality of words; and including the one or more tokens in the plurality of tokens. 4. The method as recited in the claim 1 , wherein the generating of the set of weightage matrices comprises: generating a corresponding token-frequency for each of the plurality of tokens; generating a corresponding inverse-document-frequency for each of the plurality of unique tokens; and generating a corresponding token-weightage for each of the plurality of tokens based on the corresponding token-frequency and the corresponding inverse-document-frequency. 5. The method as recited in the claim 4 , wherein the generating of the set of weightage matrices further comprises: selecting a subset of tokens from the plurality of tokens based on their corresponding token-weightages; constructing the set of weightage matrices based on the subset of tokens, the corresponding frequency scores associated with the subset of tokens, and the plurality of log files that contain the subset of tokens. 6. The method as recited in the claim 1 , wherein the generating of the plurality of nodes comprises: generating a specific node for the plurality of nodes based on the one or more tokens selected from the plurality of tokens and corresponding to one of the plurality of modules. 7. The method as recited in the claim 1 , wherein the generating of the plurality of nodes comprises: when a first token associated with a first node selected from the plurality of nodes has a similarity-distance that is closer to a second node selected from the plurality of nodes, associating the first token from the first node to the second node. 8. The method as recited in the claim 1 , wherein the identifying of the plurality of clusters from the plurality of nodes comprises: selecting an initial number of nodes from the plurality of nodes as a first set of cluster centroids associated with the plurality of clusters; for a first node selected from the plurality of nodes that are not in the first set of cluster centroids, categorizing the first node into one of the plurality of clusters by evaluating corresponding similarity-distances between the first node and the first set of cluster centroids. 9. The method as recited in the claim 8 , further comprising: after the categorizing of the first node into one of the plurality of clusters, calculating a second set of cluster centroids associated with the plurality of clusters; and for a second node selected from the plurality of nodes that are not in the second set of cluster centroids, categorizing the second node into one of the plurality of clusters by evaluating corresponding similarity-distances between the second node and the second set of cluster centroids. 10. A non-transitory computer-readable storage medium, containing a set of instructions which, when executed by a processor, cause the processor to perform a method for automatically diagnosing and resolving operational issues in a cloud environment, the method comprising: generating, by a diagnosis manager, a plurality of pre-processed files based on a plurality of log files, wherein each of the plurality of log files contains operational information related to one or more of the plurality of modules operating in the cloud environment; generating, by the diagnosis manager, a set of weightage matrices based on a plurality of tokens extracted from the plurality of pre-processed files; identifying, by the diagnosis manager, a plurality of clusters by generating a plurality of nodes corresponding to the plurality of modules based on the set of weightage matrices, and identifying the plurality of clusters from the plurality of nodes based on the set of weightage matrices, wherein each of the plurality of clusters includes a subset of tokens selected from the plurality of tokens; determining, by a resolution manager coupled with the diagnosis manager, an operational issue for a specific module selected from the plurality of modules and associated with a specific cluster selected from the plurality of clusters, based on the subset of tokens associated with the specific cluster; and performing, by the resolution manager, a predefined action on the specific module based on the operational issue. 11. The non-transitory computer-readable storage medium of the claim 10 , wherein the generating of the plurality of pre-processed files based on a plurality of log files comprises: identifying a plurality of words from a log file selected from the plurality of log files; extracting one or more tokens from the plurality of words after removing stop-words from and performing stemming on the plurality of words; and storing the one or more tokens in one of the plurality of pre-processed files associated with the log file. 12. The non-transitory computer-readable storage medium of the claim 10 , wherein the generating of the set of weightage matrices based on a plurality of tokens comprises: generating a corresponding token-frequency for each of the plurality of tokens; generating a corresponding inverse-document-frequency for each of the plurality of unique tokens; and generating a corresponding token-weightage for each of the plurality of tokens based on the corresponding token-frequency and the corresponding inverse-document-frequency. 13. The non-transitory computer-readable storage medium of the claim 12 , wherein the generating of the set of weightage matrices based on a plurality of tokens further comprises: constructing the set of weightage matrices based on the plurality of tokens, the corresponding frequency scores associated with the plurality of tokens, and the plurality of log files that contain the plurality of tokens.

Assignees

Inventors

Classifications

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • using logs of notifications; Post-processing of notifications · CPC title

  • Dumping, i.e. gathering error/state information after a fault for later diagnosis · CPC title

  • in a virtual computing platform, e.g. logically partitioned systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10528407B2 cover?
A method may include generating, by a diagnosis manager, a plurality of pre-processed files based on a plurality of log files containing operational information related to one or more of the plurality of modules operating in the cloud environment. The method may include generating a set of weightage matrices based on a plurality of tokens extracted from the plurality of pre-processed files, and…
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/0712. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).