Batch processed data structures in a log repository referencing a template repository and an attribute repository

US10049171B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10049171-B2
Application numberUS-201414482400-A
CountryUS
Kind codeB2
Filing dateSep 10, 2014
Priority dateSep 10, 2014
Publication dateAug 14, 2018
Grant dateAug 14, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method by a computer includes, for each of a plurality of log records received as part of a log stream from a host machine node, identifying a template identifier within a template repository for a template string matching an invariant string of the log record, and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record. The log records are partitioned into batches. Each of the batches are defined by a data structure that includes the template identifier and the attribute identifier for each of the log records within the batch. The data structures for each of the batches are stored into a log repository.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method by a computer comprising: for each of a plurality of log records received from a host machine node as part of a log stream being output from a software source via the host machine node: identifying a template identifier within a template repository for a template string matching an invariant string of the log record; and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record; partitioning the log records into a plurality of batches, each of the plurality of batches comprising a plurality of log records, each of the plurality of batches defined by a data structure comprising: a batched log record identifier that uniquely identifies the batch relative to the other batches; the template identifier for each of the plurality of log records within the batch; the attribute identifier for each of the plurality of log records within the batch; and a list of timestamps of the plurality of log records within the batch; and for each batch of the plurality of batches, storing the data structure for the batch into a log repository in a computer readable medium, the storing comprising: for each batch of the plurality of batches, performing data compression on the data structure for the batch to generate a compressed data structure; and for each batch of the plurality of batches, separately communicating, through a data network to a data server, an instruction to write the compressed data structure for the batch into memory of a log repository. 2. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of the log record to generate strings; comparing the strings to template strings within the template repository; identifying one of the strings of the log record as the invariant string based on a match between the one of the strings and the template string; and identifying the attribute identifier associated with the template string. 3. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of the log records to generate strings; comparing the strings of the log records to template strings within the template repository; identifying one of the strings of selected ones of the log records as the invariant string of the selected ones of the log records based on at least a threshold number of matches occurring between the one of the strings of the selected ones of the log records to a same one of the template strings within the template repository; and identifying the attribute identifier associated with the one of the template strings. 4. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of a sequence of the log record to generate strings; comparing the strings to template strings within the template repository that are ordered in a defined sequence corresponding to an output sequence from a software source on the host machine node; identifying one of the strings of the log record as the invariant string based on a match between the one of the strings and one of the template strings and further based on a previous match identified between one of the strings of a previous one of the log records and a previous one of the template strings in the defined sequence; and identifying the attribute identifier associated with the one of the template strings. 5. The method of claim 1 , further comprising: generating a new template identifier for the invariant string of the log record based on identifying that no template string in the template repository matches the invariant string of the log record; and storing the new template identifier and the invariant string of the log record in the template repository with a logical association between the new template identifier and the invariant string of the log record. 6. The method of claim 1 , further comprising: generating a new attribute identifier for the variant string of the log record based on identifying that no attribute string in the attribute repository matches the variant string of the log record; and storing the new attribute identifier associated with the variant string of the log record in the attribute repository with a logical association between the new attribute identifier and the variant string of the log record. 7. The method of claim 1 , wherein: for each of the plurality of batches, the log records within the batch have timestamps within a defined time period. 8. The method of claim 1 , further comprising: receiving a search query defining a search term and a time period to be searched; determining a range of log identifiers to search based on the time period; selecting among the plurality of batches of the log records based on the range of log identifiers; retrieving at least one compressed data structure corresponding to at least one batch of the plurality of batches of the log records from the log repository based on the selecting; for each batch of the at least one batch of the plurality of batches retrieved from the log repository, performing decompression on the compressed data structure; for each of the plurality of log records of the batch, identifying the template identifier and the attribute identifier of the log record; retrieving the template string corresponding to the template identifier from the template repository; retrieving the attribute string corresponding to the attribute identifier from the attribute repository; and generating the log record based on the template string and the attribute string; searching for the search term defined by the search query among the log records; and returning the log records containing the search term as a response to the search query. 9. The method of claim 1 , further comprising: decompressing the plurality of batches of the log records retrieved from the log repository before the identifying the template identifier and the attribute identifier of the log record. 10. The method of claim 1 , wherein each log record corresponds to a routine, wherein identifying a template identifier within a template repository for a template string matching an invariant string of the log record comprises matching the invariant string with the invariant string of at least one other log record corresponding to the routine; and wherein identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record comprises identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record that is different from the variant string of at least one other corresponding log record. 11. A computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code to, for each of a plurality of log records received from a host machine node as part of a log stream from a software source via the host machine node, perform: identifying a template identifier within a template repository for a template string matching an invariant string of the log record; and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string

Assignees

Inventors

Classifications

  • G06F17/40Primary

    Data acquisition and logging (for input to computer G06F3/00) · CPC title

  • Indexing scheme relating to error detection, to error correction, and to monitoring · CPC title

  • Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation {; Recording or statistical evaluation of user activity, e.g. usability assessment} · CPC title

  • by using string matching techniques · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10049171B2 cover?
A method by a computer includes, for each of a plurality of log records received as part of a log stream from a host machine node, identifying a template identifier within a template repository for a template string matching an invariant string of the log record, and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log recor…
Who is the assignee on this patent?
Ca Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).