Computer log retrieval based on multivariate log time series
US-2019354524-A1 · Nov 21, 2019 · US
US11797411B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11797411-B2 |
| Application number | US-202217696337-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 16, 2022 |
| Priority date | Oct 3, 2019 |
| Publication date | Oct 24, 2023 |
| Grant date | Oct 24, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An anomaly service receives log data from nodes in a computing environment, which includes a sequence of information indicative of log messages produced by the nodes. The anomaly service identifies dominant patterns in the sequence of information that are representative of non-anomalous blocks of the log messages. Having identified the dominant patterns, the service is able to extract the non-anomalous blocks from the log data to reveal anomalous blocks that do not fit the dominant patterns. The service may then generate anomaly vectors based on the anomalous blocks, which can be distributed to the nodes to detect anomalies.
Opening claim text (preview).
The invention claimed is: 1. A computing apparatus comprising: one or more computer readable storage media; one or more processors operatively coupled with the one or more computer readable storage media; and program instructions stored on the one or more computer readable storage media that, when executed by the one or more processors, direct the computing apparatus to at least: receive log data from a plurality of nodes in a computing environment, wherein the log data comprises a sequence of information indicative of log messages produced by the plurality of nodes; identify, in the sequence of information, dominant patterns representative of high-probability blocks of the log messages, wherein the dominant patterns are identified based on a frequency of their associated high-probability blocks in the log data; reveal low-probability blocks that do not fit the dominant patterns by extracting the high-probability blocks from the log data based on the dominant patterns; generate low-probability vectors based at least on the low-probability blocks; and distribute the low-probability vectors to at least one node of the plurality of nodes, wherein the at least one node detects low-probability events using the low-probability vectors. 2. The computing apparatus of claim 1 wherein to identify the dominant patterns within the sequence of information, the program instructions direct the computing apparatus to identify potential patterns within the sequence of information and select the dominant patterns from the potential patterns based at least on a scoring function applied to one or more of the potential patterns. 3. The computing apparatus of claim 2 wherein the scoring function promotes a subset of the potential patterns that occur frequently within the sequence of information relative to a different subset of the potential patterns that occur less frequently within the sequence of information. 4. The computing apparatus of claim 3 wherein the scoring function determines, for each potential pattern of the potential patterns, a relative dominance of the potential pattern based on a description length of the sequence of information when encoded with a compressed representation of the potential pattern. 5. The computing apparatus of claim 1 wherein the at least one node, to detect low-probability events using the low-probability vectors: generates hash values based on log messages produced at the at least one node; generates a sequence vector based on the hash values; compares the sequence vector to the low-probability vectors; and determines whether the sequence vector matches one or more of the low-probability vectors indicating a low-probability event. 6. The computing apparatus of claim 5 wherein the at least one node, to determine whether the sequence vector matches one or more of the low-probability vectors, employs a similarity function that determines whether the sequence vector is a sufficient match to one or more of the low-probability vectors. 7. The computing apparatus of claim 1 wherein the at least one node predicts low-probability events using the low-probability vectors. 8. One or more computer-readable storage media having program instructions stored thereon, wherein the program instructions, when read and executed by a processing system, direct the processing system to at least: receive log data from a plurality of nodes in a computing environment, wherein the log data comprises a sequence of information indicative of log messages produced by the plurality of nodes; identify, in the sequence of information, dominant patterns representative of high-probability blocks of the log messages, wherein the dominant patterns are identified based on a frequency of their associated high-probability blocks in the log data; reveal low-probability blocks that do not fit the dominant patterns by extracting the high-probability blocks from the log data based on the dominant patterns; generate low-probability vectors based at least on the low-probability blocks; and distribute the low-probability vectors to at least one node of the plurality of nodes wherein the at least one node detects low-probability events using the low-probability vectors. 9. The one or more computer-readable storage media of claim 8 wherein to identify the dominant patterns within the sequence of information, the program instructions, when executed by the processing system, direct the processing system to identify potential patterns within the sequence of information and select the dominant patterns from the potential patterns based at least on a scoring function applied to one or more of the potential patterns. 10. The one or more computer-readable storage media of claim 9 wherein the scoring function promotes a subset of the potential patterns that occur frequently within the sequence of information relative to a different subset of the potential patterns that occur less frequently within the sequence of information. 11. The one or more computer-readable storage media of claim 10 wherein the scoring function determines, for each potential pattern of the potential patterns, a relative dominance of the potential pattern based on a description length of the sequence of information when encoded with a compressed representation of the potential pattern. 12. The one or more computer-readable storage media of claim 8 wherein the at least one node, to detect low-probability events using the low-probability vectors: generates hash values based on log messages produced at the at least one node; generates a sequence vector based on the hash values; compares the sequence vector to the low-probability vectors; and determines whether the sequence vector matches one or more of the low-probability vectors indicating a low-probability event. 13. The one or more computer-readable storage media of claim 12 wherein the at least one node, to determine whether the sequence vector matches one or more of the low-probability vectors, employs a similarity function that determines whether the sequence vector is a sufficient match to one or more of the low-probability vectors. 14. The one or more computer-readable storage media of claim 8 wherein the at least one node predicts low-probability events using the low-probability vectors. 15. A method comprising: receiving log data from a plurality of nodes in a computing environment, wherein the log data comprises a sequence of information indicative of log messages produced by the plurality of nodes; identifying, in the sequence of information, dominant patterns representative of high-probability blocks of the log messages, wherein the dominant patterns are identified based on a frequency of their associated high-probability blocks in the log data; revealing low-probability blocks that do not fit the dominant patterns by extracting the high-probability blocks from the log data based on the dominant patterns; generating low-probability vectors based at least on the low-probability blocks; and distributing the low-probability vectors to at least one node of the plurality of nodes, wherein the at least one node detects low-probability events using the low-probability vectors. 16. The method of claim 15 wherein identifying the dominant patterns within the sequence of information comprises identifying potential patterns within the sequence of information and selecting the dominant patterns from the potential patterns based at least on a scoring function applied to one or more of the potential patterns. 17. The method of claim 16 wherein the sco
the data filtering being achieved in order to maintain consistency among the monitored data, e.g. ensuring that the monitored data belong to the same timeframe, to the same system or component · CPC title
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
Data logging (G06F11/14, G06F11/2205 take precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.