Categorizing log records at run-time

US10839308B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10839308-B2
Application numberUS-201514980963-A
CountryUS
Kind codeB2
Filing dateDec 28, 2015
Priority dateDec 28, 2015
Publication dateNov 17, 2020
Grant dateNov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer program products for categorizing log records at run-time are provided herein. A computer-implemented method includes generating one or more template signatures to be associated with each of multiple templates, wherein each of the multiple templates comprises a concatenation of one or more words; processing each of multiple log records derived from a data stream to determine a composition of each of the multiple log records; matching one or more of the generated template signatures to each of the multiple log records based on the determined composition of each of the multiple log records; and outputting an identification of (i) each of the multiple log records and (ii) the one or more generated template signatures matched thereto.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: generating, based on log data records associated with at least one given source, one or more template signatures to be associated with each of multiple templates, wherein each of the multiple templates comprises a concatenation of one or more words and one or more parameters, and wherein said generating the one or more template signatures comprises (i) identifying which particular word appears at which particular location of each of the multiple templates and (ii) ensuring that the one or more template signatures to be associated with each respective one of the multiple templates are unique; processing each of multiple additional log records, from the same at least one given source, derived from a data stream to determine a composition of each of the multiple additional log records; matching one or more of the generated template signatures to each of the multiple additional log records based on the determined composition of each of the multiple additional log records; assigning unique template identifier values to the multiple additional log records, each unique template identifier value corresponding to one of the multiple templates, wherein the unique template identifier values categorize said multiple additional log records at ingestion time: determining a frequency with which the one or more generated template signatures are matched to at least one of the multiple additional log records; and outputting an identification of (i) each of the multiple additional log records, (ii) the one or more generated template signatures matched thereto, and (iii) the determined frequency for each of the one or more generated template signatures; wherein the method is carried out by at least one computing device. 2. The computer-implemented method of claim 1 , wherein each of the one or more template signatures comprises a predetermined length. 3. The computer-implemented method of claim 1 , wherein said multiple templates are derived from a database. 4. The computer-implemented method of claim 1 , wherein each of the multiple templates comprises a concatenation of one or more words that pertain to system and/or application records. 5. The computer-implemented method of claim 1 , wherein the data stream is obtained via a data center. 6. The computer-implemented method of claim 1 , wherein the composition of each of the multiple additional log records comprises an arrangement of one or more words. 7. The computer-implemented method of claim 1 , wherein said outputting comprises outputting the identification to a database. 8. The computer-implemented method of claim 1 , wherein said outputting comprises outputting the identification to a user. 9. The computer-implemented method of claim 1 , wherein said determining the frequency comprises learning a workload distribution associated with the data stream. 10. The computer-implemented method of claim 1 , wherein said determining the frequency further comprises maintaining a cache comprising each of one or more of the generated template signatures having a determined frequency above a predetermined threshold. 11. A computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to: generate, based on log data records associated with at least one given source, one or more template signatures to be associated with each of multiple templates, wherein each of the multiple templates comprises a concatenation of one or more words and one or more parameters, and wherein said generating the one or more template signatures comprises (i) identifying which particular word appears at which particular location of each of the multiple templates and (ii) ensuring that the one or more template signatures to be associated with each respective one of the multiple templates are unique; process each of multiple additional log records, from the same at least one given source, derived from a data stream to determine a composition of each of the multiple additional log records; match one or more of the generated template signatures to each of the multiple additional log records based on the determined composition of each of the additional multiple log records; assign unique template identifier values to the multiple additional log records, each unique template identifier value corresponding to one of the multiple templates, wherein the unique template identifier values categorize said multiple additional log records at ingestion time: determine a frequency with which the one or more generated template signatures are matched to at least one of the multiple additional log records; and output an identification of (i) each of the multiple additional log records, (ii) the one or more generated template signatures matched thereto, and (iii) the determined frequency for each of the one or more generated template signatures. 12. The computer program product of claim 11 , wherein each of the one or more template signatures comprises a predetermined length. 13. The computer program product of claim 11 , wherein each of the multiple templates comprises a concatenation of one or more words that pertain to system and/or application records. 14. The computer program product of claim 11 , wherein said outputting comprises outputting to a database and/or a user. 15. The computer program product of claim 11 , wherein said determining the frequency comprises learning a workload distribution associated with the data stream. 16. The computer program product of claim 11 , wherein said determining the frequency further comprises maintaining a cache comprising each of one or more of the generated template signatures having a determined frequency above a predetermined threshold. 17. A system comprising: a memory; and at least one processor coupled to the memory and configured for: generating, based on log data records associated with at least one given source, one or more template signatures to be associated with each of multiple templates, wherein each of the multiple templates comprises a concatenation of one or more words and one or more parameters, and wherein said generating the one or more template signatures comprises (i) identifying which particular word appears at which particular location of each of the multiple templates and (ii) ensuring that the one or more template signatures to be associated with each respective one of the multiple templates are unique; processing each of multiple additional log records, from the same at least one given source, derived from a data stream to determine a composition of each of the multiple additional log records; matching one or more of the generated template signatures to each of the multiple additional log records based on the determined composition of each of the multiple additional log records; assigning unique template identifier values to the multiple additional log records, each unique template identifier value corresponding to one of the multiple templates, wherein the unique template identifier values categorize said multiple additional log records at ingestion time: determining a frequency with which the one or more generated template signatures are matched to at least one of the multiple additional log records; and outputting an identification of (i) each of the multiple additional log records, (ii) the one or more generated template signatures matched thereto, and (iii) the determined frequency for each of the one or more

Assignees

Inventors

Classifications

  • using natural language analysis · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Querying · CPC title

  • Data stream processing; Continuous queries · CPC title

  • Creation or modification of classes or clusters · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10839308B2 cover?
Methods, systems, and computer program products for categorizing log records at run-time are provided herein. A computer-implemented method includes generating one or more template signatures to be associated with each of multiple templates, wherein each of the multiple templates comprises a concatenation of one or more words; processing each of multiple log records derived from a data stream t…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/3344. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).