Method and system for compressing logs
US-9619478-B1 · Apr 11, 2017 · US
US10049171B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10049171-B2 |
| Application number | US-201414482400-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 10, 2014 |
| Priority date | Sep 10, 2014 |
| Publication date | Aug 14, 2018 |
| Grant date | Aug 14, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method by a computer includes, for each of a plurality of log records received as part of a log stream from a host machine node, identifying a template identifier within a template repository for a template string matching an invariant string of the log record, and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record. The log records are partitioned into batches. Each of the batches are defined by a data structure that includes the template identifier and the attribute identifier for each of the log records within the batch. The data structures for each of the batches are stored into a log repository.
Opening claim text (preview).
The invention claimed is: 1. A method by a computer comprising: for each of a plurality of log records received from a host machine node as part of a log stream being output from a software source via the host machine node: identifying a template identifier within a template repository for a template string matching an invariant string of the log record; and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record; partitioning the log records into a plurality of batches, each of the plurality of batches comprising a plurality of log records, each of the plurality of batches defined by a data structure comprising: a batched log record identifier that uniquely identifies the batch relative to the other batches; the template identifier for each of the plurality of log records within the batch; the attribute identifier for each of the plurality of log records within the batch; and a list of timestamps of the plurality of log records within the batch; and for each batch of the plurality of batches, storing the data structure for the batch into a log repository in a computer readable medium, the storing comprising: for each batch of the plurality of batches, performing data compression on the data structure for the batch to generate a compressed data structure; and for each batch of the plurality of batches, separately communicating, through a data network to a data server, an instruction to write the compressed data structure for the batch into memory of a log repository. 2. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of the log record to generate strings; comparing the strings to template strings within the template repository; identifying one of the strings of the log record as the invariant string based on a match between the one of the strings and the template string; and identifying the attribute identifier associated with the template string. 3. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of the log records to generate strings; comparing the strings of the log records to template strings within the template repository; identifying one of the strings of selected ones of the log records as the invariant string of the selected ones of the log records based on at least a threshold number of matches occurring between the one of the strings of the selected ones of the log records to a same one of the template strings within the template repository; and identifying the attribute identifier associated with the one of the template strings. 4. The method of claim 1 , wherein the identifying a template identifier within the template repository for a template string matching an invariant string of the log record, comprises: parsing content of a sequence of the log record to generate strings; comparing the strings to template strings within the template repository that are ordered in a defined sequence corresponding to an output sequence from a software source on the host machine node; identifying one of the strings of the log record as the invariant string based on a match between the one of the strings and one of the template strings and further based on a previous match identified between one of the strings of a previous one of the log records and a previous one of the template strings in the defined sequence; and identifying the attribute identifier associated with the one of the template strings. 5. The method of claim 1 , further comprising: generating a new template identifier for the invariant string of the log record based on identifying that no template string in the template repository matches the invariant string of the log record; and storing the new template identifier and the invariant string of the log record in the template repository with a logical association between the new template identifier and the invariant string of the log record. 6. The method of claim 1 , further comprising: generating a new attribute identifier for the variant string of the log record based on identifying that no attribute string in the attribute repository matches the variant string of the log record; and storing the new attribute identifier associated with the variant string of the log record in the attribute repository with a logical association between the new attribute identifier and the variant string of the log record. 7. The method of claim 1 , wherein: for each of the plurality of batches, the log records within the batch have timestamps within a defined time period. 8. The method of claim 1 , further comprising: receiving a search query defining a search term and a time period to be searched; determining a range of log identifiers to search based on the time period; selecting among the plurality of batches of the log records based on the range of log identifiers; retrieving at least one compressed data structure corresponding to at least one batch of the plurality of batches of the log records from the log repository based on the selecting; for each batch of the at least one batch of the plurality of batches retrieved from the log repository, performing decompression on the compressed data structure; for each of the plurality of log records of the batch, identifying the template identifier and the attribute identifier of the log record; retrieving the template string corresponding to the template identifier from the template repository; retrieving the attribute string corresponding to the attribute identifier from the attribute repository; and generating the log record based on the template string and the attribute string; searching for the search term defined by the search query among the log records; and returning the log records containing the search term as a response to the search query. 9. The method of claim 1 , further comprising: decompressing the plurality of batches of the log records retrieved from the log repository before the identifying the template identifier and the attribute identifier of the log record. 10. The method of claim 1 , wherein each log record corresponds to a routine, wherein identifying a template identifier within a template repository for a template string matching an invariant string of the log record comprises matching the invariant string with the invariant string of at least one other log record corresponding to the routine; and wherein identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record comprises identifying an attribute identifier in an attribute repository for an attribute string matching a variant string of the log record that is different from the variant string of at least one other corresponding log record. 11. A computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code to, for each of a plurality of log records received from a host machine node as part of a log stream from a software source via the host machine node, perform: identifying a template identifier within a template repository for a template string matching an invariant string of the log record; and identifying an attribute identifier in an attribute repository for an attribute string matching a variant string
Data acquisition and logging (for input to computer G06F3/00) · CPC title
Indexing scheme relating to error detection, to error correction, and to monitoring · CPC title
Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation {; Recording or statistical evaluation of user activity, e.g. usability assessment} · CPC title
by using string matching techniques · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.