Analyzing log streams based on correlations between data structures of defined node types

US9779005B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9779005-B2
Application numberUS-201414313075-A
CountryUS
Kind codeB2
Filing dateJun 24, 2014
Priority dateJun 24, 2014
Publication dateOct 3, 2017
Grant dateOct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method by a log stream analysis computer includes identifying records of log streams within a log repository containing a defined term. The log streams are generated by respective software sources executed by the host nodes. Similarity values are determined to indicate similarity between content of the records containing the defined term. A term node is generated to contain a data structure that identifies the defined term and lists identities of the records and corresponding ones of the similarity values. Related log stream analysis computers are disclosed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method by a log stream analysis computer comprising: identifying a set of records within log streams within a log repository containing a defined term and which are all constrained to being within a time period, wherein the log streams are generated by respective software sources executed by the host nodes characterizing performance data captured by the respective software sources over the time period; determining similarity values that indicate similarity between content of the records containing the defined term in the time period; generating a term node comprising a data structure that identifies the defined term and lists identities of the records containing the defined term in the time period and corresponding ones of the similarity values in the time period; identifying hosts within the log streams; for each of a plurality of the hosts not corresponding to an existing host node, generating a host node comprising a data structure that identifies the host and lists an identity of a hardware configuration of the host, an identity of the software source of the log stream, and a defined type of the software source; and determining correlation between records of the log streams within the log repository based on content of the data structures of the host nodes. 2. The method of claim 1 further comprising: determining correlation between the records of the log streams within the log repository based on content of the data structure of the term node and a defined correlation rule. 3. The method of claim 1 further comprising: repeating for each of a plurality of defined terms, the identifying records, the determining similarity values, and the generating a term node, wherein different ones of the term nodes correspond to different ones of the defined terms; and determining correlation between content of records of log streams within the log repository based on content of the data structure of the term nodes and a defined correlation rule. 4. The method of claim 3 , further comprising: providing information based on content of the data structures of a plurality of term nodes for display on a display device; receiving a selection of one of the term nodes displayed on the display device; and determining correlation between records of the log streams within the log repository based on comparison of content of the data structure of the selected one of the term nodes to content of the data structures of other term nodes. 5. The method of claim 4 , further comprising: selecting another term node based on the comparison of content of the data structure of the selected one of the term nodes to content of the data structure of the other term node satisfying a defined correlation rule; and providing information based on content of the data structure of the other term node for display on the display device. 6. The method of claim 1 wherein: determining similarity values comprises calculating min-wise independent permutation locality sensitive hashing (MinHash) values of content of the records containing the defined term; and generating the term node comprises storing the MinHash values associated with corresponding ones of the records in the data structure of the term node. 7. The method of claim 6 further comprising: counting a number of occurrences of the defined term in each of the records containing the defined term; and generating the term node comprises storing the number of occurrences associated with corresponding ones of the records in the data structure of the term node. 8. The method of claim 1 , further comprising based on the defined term not being present in a data dictionary containing lists of terms and corresponding term value identifiers, adding the defined term and a corresponding term value identifier to the data dictionary; and wherein the defined term is identified in the data structure by the corresponding term value identifier from the data dictionary. 9. The method of claim 1 , further comprising: providing information based on content of the data structures of a plurality of host nodes for display on a display device; receiving a selection of one of the host nodes displayed on the display device; and determining correlation between records of the log streams within the log repository based on comparison of content of the data structure of the selected one of the host nodes to content of the data structures of other host nodes. 10. The method of claim 9 , further comprising: selecting another host node based on the comparison of content of the data structure of the selected one of the host nodes to content of the data structure of the other host node satisfying a defined correlation rule; and providing information based on content of the data structure of the other host node for display on the display device. 11. A method by a log stream analysis computer comprising: identifying a set of records within log streams within a log repository containing a defined term and which are all constrained to being within a time period, wherein the log streams are generated by respective software sources executed by the host nodes characterizing performance data captured by the respective software sources over the time period; determining similarity values that indicate similarity between content of the records containing the defined term in the time period; generating a term node comprising a data structure that identifies the defined term and lists identities of the records containing the defined term in the time period and corresponding ones of the similarity values in the time period; identifying software sources within the log streams; for each of a plurality of the software sources having a defined type not corresponding to an existing source type node, generating a source type node containing a data structure that identifies the defined type of the software source and lists identifiers of records of one of the log streams generated by the software source, identifies the software source, and identifies one of the host nodes executing the software source; and determining correlation between records of the log streams within the log repository based on content of the data structures of the source type nodes. 12. The method of claim 11 , further comprising: providing information based on content of the data structures of a plurality of source type nodes for display on a display device; receiving a selection of one of the source type nodes displayed on the display device; and determining correlation between records of the log streams within the log repository based on comparison of content of the data structure of the selected one of the source type nodes to content of the data structures of other source type nodes. 13. The method of claim 12 , further comprising: selecting another source type node based on the comparison of content of the data structure of the selected one of the source type nodes to content of the data structure of the other source type node satisfying a defined correlation rule; and providing information based on content of the data structure of the other source type node for display on the display device. 14. A log stream analysis computer comprising: a processor; and a memory coupled to the processor, the memory comprising a non-transitory computer readable storage medium having computer readable program code embodied in the medium that when executed by the processor causes the processor to perform operations comprising: identifying a set of records within log streams within a log repository containing a defined term and which

Assignees

Inventors

Classifications

  • Data logging (G06F11/14, G06F11/2205 take precedence) · CPC title

  • Physics · mapped topic

  • G06F11/321Primary

    Display for diagnostics, e.g. diagnostic result display, self-test user interface · CPC title

  • Event-based monitoring · CPC title

  • Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs (verification or detection of system hardware configuration G06F11/2247) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9779005B2 cover?
A method by a log stream analysis computer includes identifying records of log streams within a log repository containing a defined term. The log streams are generated by respective software sources executed by the host nodes. Similarity values are determined to indicate similarity between content of the records containing the defined term. A term node is generated to contain a data structure t…
Who is the assignee on this patent?
Ca Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/321. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).