Recommender system for heterogeneous log pattern editing operation

US10929763B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10929763-B2
Application numberUS-201715684293-A
CountryUS
Kind codeB2
Filing dateAug 23, 2017
Priority dateAug 26, 2016
Publication dateFeb 23, 2021
Grant dateFeb 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A heterogeneous log pattern editing recommendation system and computer-implemented method are provided. The system has a processor configured to identify, from heterogeneous logs, patterns including variable fields and constant fields. The processor is also configured to extract a category feature, a cardinality feature, and a before-after n-gram feature by tokenizing the variable fields in the identified patterns. The processor is additionally configured to generate target similarity scores between target fields to be potentially edited and other fields from among the variable fields in the heterogeneous logs using pattern editing operations based on the extracted category feature, the extracted cardinality feature, and the extracted before-after n-gram feature. The processor is further configured to recommend, to a user, log pattern edits for at least one of the target fields based on the target similarity scores between the target fields in the heterogeneous logs.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented heterogeneous log pattern editing recommendation method performed in a network having network devices that generate heterogeneous logs, the method comprising: identifying, by a processor from the heterogeneous logs, patterns comprising variable fields and constant fields; extracting, by the processor, a category feature, a cardinality feature, and a before-after n-gram feature by tokenizing the variable fields in the identified patterns; generating, by the processor, target similarity scores between target fields to be potentially edited and other fields from among the variable fields in the heterogeneous logs using pattern editing operations based on the extracted category feature, the extracted cardinality feature, and the extracted before-after n-gram feature using a combined field similarity matrix Θ comb generated by fusing a plurality of similarity matrices by: Θ comb =Θ category ⊙(α*Θ cardinality +(1−α)*Θ before-after-n-grams ), where Θ category is a category similarity matrix, Θ cardinality is a cardinality similarity matrix, and Θ before-after-n-grams is a before-after n-grams similarity matrix for fields in the patterns, α is a contribution parameter using to balance the weights of similarity matrices generated from cardinality and before-after n-grams, and ⊙ is the element-wise matrix multiplication; wherein the category similarity matrix includes a category similarity score determined for groupings of the target fields, and wherein the category similarity score has a first value responsive to the target fields in a particular one of the groupings belonging to a same category and a second value responsive to the target fields in the particular one of the groupings belonging to different categories, and wherein the category similarity score is used in the category similarity matrix to generate the combined field similarity matrix; and recommending, by the processor to a user, log pattern edits for at least one of the target fields based on the target similarity scores between the target fields in the heterogeneous logs; and auto-implementing recommended log pattern edits; and controlling one or more systems, machines, or devices in the network using the auto-implemented recommended log pattern edits. 2. The computer-implemented method of claim 1 , wherein the tokenized variable fields are based on a delimiter. 3. The computer-implemented method of claim 1 , wherein the category feature is selected from the group consisting of only numbers, only non-space characters, an internet protocol address, only letters, and date and time information. 4. The computer-implemented method of claim 1 , wherein the cardinality feature is a total number of unique values in one of the variable fields across the heterogeneous logs. 5. The computer-implemented method of claim 1 , wherein the before-after n-gram feature is determined by: locating one of the target fields; extracting before n-grams tokens and after n-grams tokens for fields adjacent to the one of the target fields; and concatenating the extracted before n-grams tokens and the extracted after n-grams tokens into a string. 6. The computer-implemented method of claim 1 , further comprises performing, by the processor, the recommended log pattern edits on the identified patterns after confirmation by the user. 7. The computer-implemented method of claim 1 , wherein the pattern editing operations include variable-level operations, wherein the variable-level operations generate the combined field similarity matrix by fusing the category similarity matrix, the cardinality similarity matrix, and the before-after n-gram similarity matrix, wherein the combined field similarity matrix is used to generate the target similarity scores. 8. The computer-implemented method of claim 7 , wherein the cardinality similarity matrix includes a cardinality similarity score determined for groupings of the target fields, wherein the cardinality similarity score for a particular on of the groupings is determined by a quantity subtracted from one, wherein the quantity is a normalized difference of cardinalities of the target fields in the particular one of the groupings, and wherein the cardinality similarity score is used in the cardinality similarity matrix to generate the combined field similarity matrix. 9. The computer-implemented method of claim 7 , wherein the before-after n-gram similarity matrix includes a respective before-after similarity score determined for each of groupings of the target fields, wherein the before-after similarity score for a given one of the groupings of the target fields is determined by a quantity subtracted from one, wherein the quantity is an edit difference between the before-after n-gram features of the target fields in the given one of the groupings, and wherein the respective before-after similarity score is used in the before-after n-gram similarity matrix to generate the combined field similarity matrix. 10. The computer-implemented method of claim 1 , wherein the pattern editing operations include constant-level operations, wherein the constant-level operations include a merge operation, and wherein the merge operation calculates a merge similarity score between various ones of the constant fields the user designates to merge and various other ones of the constant fields in the patterns, and wherein the merge similarity score is determined by an edit distance between the various ones of the constant fields the user designates to merge and the various other ones of the constant fields in the patterns, and wherein the merge similarity score is used to generate the target similarity scores. 11. The computer-implemented method of claim 1 , wherein the pattern editing operations include constant-level operations, wherein the constant-level operations include a generalization operation, and wherein the generalization operation calculates a generalization similarity score between various ones of the constant fields the user designates and various other ones of the constant fields in the patterns, and wherein the generalization similarity score is determined by a quantity subtracted from one, wherein the quantity is an edit distance normalized by dividing the before-after n-gram features of the various other ones of the constant fields in the patterns by a maximum number of characters between the before-after n-gram features of the various ones of the constant fields the user designates, wherein the generalization similarity score is used to generate the target similarity scores. 12. The computer-implemented method of claim 1 , wherein the pattern editing operations include pattern-level operations, wherein the pattern-level operations calculate a pattern similarity matrix between patterns, wherein the pattern similarity matrix being a total number of pairs of tokens with a same type in a same position in each of the patterns, and wherein the pattern similarity matrix is used to generate the target similarity scores. 13. A non-transitory article of manufacture tangibly embodying a computer readable program for heterogeneous log pattern editing recommendation performed in a network having network devices that generate heterogeneous logs, which when executed causes a computer to: identify, by a processor from the heterogeneous logs, patterns comprising variable fields and constant fields; extract, by the processor, a category feature, a cardinality feature, and a before-after n-gram feature by tokenizing the variable fields in the identified patterns; generate, by the processor, target similarity scores between target fields to be potenti

Assignees

Inventors

Classifications

  • G06F40/16Primary

    Automatic learning of transformation rules, e.g. from examples · CPC title

  • Handling natural language data (speech analysis or synthesis, speech recognition G10L) · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06N5/047Primary

    Pattern matching networks; Rete networks · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10929763B2 cover?
A heterogeneous log pattern editing recommendation system and computer-implemented method are provided. The system has a processor configured to identify, from heterogeneous logs, patterns including variable fields and constant fields. The processor is also configured to extract a category feature, a cardinality feature, and a before-after n-gram feature by tokenizing the variable fields in the…
Who is the assignee on this patent?
Nec Lab America Inc, Nec Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).