Intelligent data mining and processing of machine generated logs

US9607059B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9607059-B2
Application numberUS-201414169389-A
CountryUS
Kind codeB2
Filing dateJan 31, 2014
Priority dateJan 31, 2014
Publication dateMar 28, 2017
Grant dateMar 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to some embodiments, a method and an apparatus of analyzing log files comprises sampling a log and determining a structure associated with the log file based on the sampling and a pattern within the structure. If the structure and the pattern are stored in a repository, data from the log file will be exported into a database based on the determined pattern.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of analyzing log files, the method comprising: sampling a log file comprising a plurality of structures; determining, via a processor, one of the plurality of structures associated with the log file based on the sampling and a pattern within the one of the plurality of structures; determining a type of delimiter associated with the log file; determining if the one of the plurality of structures and the pattern are stored in a repository; parsing the log fields based on the type of delimiter; discovering log field content types based on the log file's data, patterns, distinct values and regular expressions; assigning log field content types to the parsed log fields; determining that a variety of field names are possible based on content from previously stored log file patterns within the repository; presenting field name options to a user to select a field name based on the determined variety of field names; standardizing the parsed log fields based on a selected field name from the previously stored log file patterns within the repository; and exporting data from the log file into a database. 2. The method of claim 1 , wherein sampling comprises analyzing the log file line by line and exporting is based on the schema embedded within a start of the log file. 3. The method of claim 1 , further comprising determining a format of the log file based on a location of the log file. 4. The method of claim 1 , wherein the method further comprises: standardizing log fields further based on receiving a selection of field names from a variety of possible field names for the log fields that are stored within the repository; and saving the pattern in the repository. 5. The method of claim 4 , wherein the method further comprises: proposing enhancements. 6. The method of claim 5 , wherein proposing enhancements comprises: presenting related fields to a user, the related fields associated with data contained within the log file. 7. The method of claim 4 , wherein the method further comprises: presenting related fields to a user, the related fields associated with one or more other log files. 8. A non-transitory computer-readable medium comprising instructions that when executed by a processor perform a method of analyzing log files, the method comprising: sampling a log file; determining, via a processor, a structure associated with the log file based on the sampling and a pattern within the structure; determining if the structure and the pattern are stored in a repository; determining a type of delimiter associated with the log file; parsing the log fields based on the type of delimiter; discovering log field content types based on the log file's data, patterns, distinct values and regular expressions; assigning log field content types to the parsed log fields; determining that a variety of field names are possible based on content from previously stored log file patterns within the repository; presenting field name options to a user to select a field name based on the determined variety of field names; standardizing the parsed log fields based on a selected field name from the previously stored log file patterns within the repository; and exporting data from the log file into a database. 9. The medium of claim 8 , wherein sampling comprises analyzing the log file line by line and exporting is based on the schema embedded within a start of the log file. 10. The medium of claim 8 further comprising determining a format of the log file based on a location of the log file. 11. The medium of claim 8 , wherein the method further comprises: standardizing log fields further based on receiving a selection of field names from a variety of possible field names for the log fields that are stored within the repository; and saving the pattern in the repository. 12. The medium of claim 11 , wherein the method further comprises: proposing enhancements. 13. The medium of claim 12 , wherein proposing enhancements comprises: presenting related fields to a user, the related fields associated with data contained within the log file. 14. The medium of claim 11 , wherein the method further comprises: presenting related fields to a user, the related fields associated with one or more other log files. 15. A system comprising: a processor; and a non-transitory computer-readable medium comprising instructions that when executed by a processor perform a method of analyzing log files, the method comprising: sampling a log file; determining a structure associated with the log file based on the sampling and a pattern within the structure; determining if the structure and the pattern are stored in a repository; determining a type of delimiter associated with the log file; parsing the log fields based on the type of delimiter; discovering log field content types based on the log file's data, patterns, distinct values and regular expressions; assigning log field content types to the parsed log fields; determining that a variety of field names are possible based on content from previously stored log file patterns within the repository; presenting field name options to a user to select a field name based on the determined variety of field names; standardizing the parsed log fields based on a selected field name from the previously stored log file patterns within the repository; and exporting data from the log file into a database. 16. The system of claim 15 , wherein sampling comprises analyzing the log file line by line and exporting is based on the schema embedded within a start of the log file. 17. The system of claim 15 further comprising determining a format of the log file based on a location of the log file. 18. The system of claim 15 , wherein the method further comprises: standardizing log fields further based on receiving a selection of field names from a variety of possible field names for the log fields that are stored within the repository; and saving the pattern in the repository. 19. The system of claim 15 , wherein the schema is determined by analyzing a nested structure within the log file. 20. The system of claim 15 , wherein the method further comprises presenting related fields to a user, the related fields associated with data contained within the log file.

Assignees

Inventors

Classifications

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Data acquisition and logging (for input to computer G06F3/00) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9607059B2 cover?
According to some embodiments, a method and an apparatus of analyzing log files comprises sampling a log and determining a structure associated with the log file based on the sampling and a pattern within the structure. If the structure and the pattern are stored in a repository, data from the log file will be exported into a database based on the determined pattern.
Who is the assignee on this patent?
Syed Awez, Yan Nancy, Puranik Hermant, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).