Cluster-based processing of unstructured log messages
US-2018102938-A1 · Apr 12, 2018 · US
US2018285397A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018285397-A1 |
| Application number | US-201715478304-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 4, 2017 |
| Priority date | Apr 4, 2017 |
| Publication date | Oct 4, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a device in a network tokenizes a plurality of strings from unstructured log data into entity tokens and non-entity tokens. The entity tokens identify entities in the network. The device identifies patterns of tokens in the tokenized strings. The device determines entity-centric contexts from the identified patterns. A particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings. The device associates similar ones of the entity-centric contexts. The device generates a lookup index based in part on the entities and the similar entity-centric contexts.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: tokenizing, by a device in a network, a plurality of strings from unstructured log data into entity tokens and non-entity tokens, wherein the entity tokens identify entities in the network; identifying, by the device, patterns of tokens in the tokenized strings; determining, by the device, entity-centric contexts from the identified patterns, wherein a particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings; associating, by the device, similar ones of the entity-centric contexts; and generating, by the device, a lookup index based in part on the entities and the similar entity-centric contexts. 2 . The method as in claim 1 , wherein the entities comprise one or more of: network addresses, network services, or virtual processes. 3 . The method as in claim 1 , further comprising: receiving, at the device, a lookup request for a particular entity; and providing, by the device, a lookup response indicative of the entities in the lookup index that have similar entity-centric contexts as that of the particular entity. 4 . The method as in claim 1 , wherein identifying the patterns of tokens in the tokenized strings comprises: treating, by the device, the entity tokens that appear in the strings as wildcards. 5 . The method as in claim 1 , wherein associating similar ones of the entity-centric contexts comprises: mapping, by the device, the entity-centric contexts to vectors in a vector space, wherein two similar entity-centric contexts are deemed similar to one another based on the distance between their respective vectors in the vector space. 6 . The method as in claim 5 , wherein mapping the entity-centric contexts to vectors in the vector space comprises: using, by the device, a trained neural network to map the entity-centric contexts to vectors in the vector space. 7 . The method as in claim 1 , wherein the entity tokens comprise unique identifiers for the entities. 8 . An apparatus, comprising: one or more network interfaces to communicate with a network; a processor coupled to the one or more network interfaces and configured to execute a process; and a memory configured to store the process executable by the processor, the process when executed configured to: tokenize a plurality of strings from unstructured log data into entity tokens and non-entity tokens, wherein the entity tokens identify entities in the network; identify patterns of tokens in the tokenized strings; determine entity-centric contexts from the identified patterns, wherein a particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings; associate similar ones of the entity-centric contexts; and generate a lookup index based in part on the entities and the similar entity-centric contexts. 9 . The apparatus as in claim 8 , wherein the entities comprise one or more of: network addresses, network services, or virtual processes. 10 . The apparatus as in claim 8 , wherein the process when executed is further configured to: receive a lookup request for a particular entity; and provide a lookup response indicative of the entities in the lookup index that have similar entity-centric contexts as that of the particular entity. 11 . The apparatus as in claim 8 , wherein the apparatus identifies the patterns of tokens in the tokenized strings by: treating the entity tokens that appear in the strings as wildcards. 12 . The apparatus as in claim 8 , wherein the apparatus associates similar ones of the entity-centric contexts by: mapping the entity-centric contexts to vectors in a vector space, wherein two similar entity-centric contexts are deemed similar to one another based on the distance between their respective vectors in the vector space. 13 . The apparatus as in claim 12 , wherein the apparatus maps the entity-centric contexts to vectors in the vector space using a trained neural network. 14 . The apparatus as in claim 8 , wherein the entity tokens comprise unique identifiers for the entities. 15 . A tangible, non-transitory, computer-readable medium storing program instructions that cause a device in a network to execute a process comprising: tokenizing, by the device, a plurality of strings from unstructured log data into entity tokens and non-entity tokens, wherein the entity tokens identify entities in the network; identifying, by the device, patterns of tokens in the tokenized strings; determining, by the device, entity-centric contexts from the identified patterns, wherein a particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings; associating, by the device, similar ones of the entity-centric contexts; and generating, by the device, a lookup index based in part on the entities and the similar entity-centric contexts. 16 . The computer-readable medium as in claim 15 , wherein the entities comprise one or more of: network addresses, network services, or virtual processes. 17 . The computer-readable medium as in claim 15 , wherein the process further comprises: receiving, at the device, a lookup request for a particular entity; and providing, by the device, a lookup response indicative of the entities in the lookup index that have similar entity-centric contexts as that of the particular entity. 18 . The computer-readable medium as in claim 15 , wherein identifying the patterns of tokens in the tokenized strings comprises: treating, by the device, the entity tokens that appear in the strings as wildcards. 19 . The computer-readable medium as in claim 15 , wherein associating similar ones of the entity-centric contexts comprises: mapping, by the device, the entity-centric contexts to vectors in a vector space, wherein two similar entity-centric contexts are deemed similar to one another based on the distance between their respective vectors in the vector space. 20 . The computer-readable medium as in claim 19 , wherein mapping the entity-centric contexts to vectors in the vector space comprises: using, by the device, a trained neural network to map the entity-centric contexts to vectors in the vector space.
Learning methods · CPC title
using logs of notifications; Post-processing of notifications · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
using context · CPC title
Indexing structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.