Entity-centric log indexing with context embedding

US2018285397A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018285397-A1
Application numberUS-201715478304-A
CountryUS
Kind codeA1
Filing dateApr 4, 2017
Priority dateApr 4, 2017
Publication dateOct 4, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a device in a network tokenizes a plurality of strings from unstructured log data into entity tokens and non-entity tokens. The entity tokens identify entities in the network. The device identifies patterns of tokens in the tokenized strings. The device determines entity-centric contexts from the identified patterns. A particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings. The device associates similar ones of the entity-centric contexts. The device generates a lookup index based in part on the entities and the similar entity-centric contexts.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: tokenizing, by a device in a network, a plurality of strings from unstructured log data into entity tokens and non-entity tokens, wherein the entity tokens identify entities in the network; identifying, by the device, patterns of tokens in the tokenized strings; determining, by the device, entity-centric contexts from the identified patterns, wherein a particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings; associating, by the device, similar ones of the entity-centric contexts; and generating, by the device, a lookup index based in part on the entities and the similar entity-centric contexts. 2 . The method as in claim 1 , wherein the entities comprise one or more of: network addresses, network services, or virtual processes. 3 . The method as in claim 1 , further comprising: receiving, at the device, a lookup request for a particular entity; and providing, by the device, a lookup response indicative of the entities in the lookup index that have similar entity-centric contexts as that of the particular entity. 4 . The method as in claim 1 , wherein identifying the patterns of tokens in the tokenized strings comprises: treating, by the device, the entity tokens that appear in the strings as wildcards. 5 . The method as in claim 1 , wherein associating similar ones of the entity-centric contexts comprises: mapping, by the device, the entity-centric contexts to vectors in a vector space, wherein two similar entity-centric contexts are deemed similar to one another based on the distance between their respective vectors in the vector space. 6 . The method as in claim 5 , wherein mapping the entity-centric contexts to vectors in the vector space comprises: using, by the device, a trained neural network to map the entity-centric contexts to vectors in the vector space. 7 . The method as in claim 1 , wherein the entity tokens comprise unique identifiers for the entities. 8 . An apparatus, comprising: one or more network interfaces to communicate with a network; a processor coupled to the one or more network interfaces and configured to execute a process; and a memory configured to store the process executable by the processor, the process when executed configured to: tokenize a plurality of strings from unstructured log data into entity tokens and non-entity tokens, wherein the entity tokens identify entities in the network; identify patterns of tokens in the tokenized strings; determine entity-centric contexts from the identified patterns, wherein a particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings; associate similar ones of the entity-centric contexts; and generate a lookup index based in part on the entities and the similar entity-centric contexts. 9 . The apparatus as in claim 8 , wherein the entities comprise one or more of: network addresses, network services, or virtual processes. 10 . The apparatus as in claim 8 , wherein the process when executed is further configured to: receive a lookup request for a particular entity; and provide a lookup response indicative of the entities in the lookup index that have similar entity-centric contexts as that of the particular entity. 11 . The apparatus as in claim 8 , wherein the apparatus identifies the patterns of tokens in the tokenized strings by: treating the entity tokens that appear in the strings as wildcards. 12 . The apparatus as in claim 8 , wherein the apparatus associates similar ones of the entity-centric contexts by: mapping the entity-centric contexts to vectors in a vector space, wherein two similar entity-centric contexts are deemed similar to one another based on the distance between their respective vectors in the vector space. 13 . The apparatus as in claim 12 , wherein the apparatus maps the entity-centric contexts to vectors in the vector space using a trained neural network. 14 . The apparatus as in claim 8 , wherein the entity tokens comprise unique identifiers for the entities. 15 . A tangible, non-transitory, computer-readable medium storing program instructions that cause a device in a network to execute a process comprising: tokenizing, by the device, a plurality of strings from unstructured log data into entity tokens and non-entity tokens, wherein the entity tokens identify entities in the network; identifying, by the device, patterns of tokens in the tokenized strings; determining, by the device, entity-centric contexts from the identified patterns, wherein a particular entity-centric context comprises a sequence of tokens that precede or follow an entity token in the tokenized strings; associating, by the device, similar ones of the entity-centric contexts; and generating, by the device, a lookup index based in part on the entities and the similar entity-centric contexts. 16 . The computer-readable medium as in claim 15 , wherein the entities comprise one or more of: network addresses, network services, or virtual processes. 17 . The computer-readable medium as in claim 15 , wherein the process further comprises: receiving, at the device, a lookup request for a particular entity; and providing, by the device, a lookup response indicative of the entities in the lookup index that have similar entity-centric contexts as that of the particular entity. 18 . The computer-readable medium as in claim 15 , wherein identifying the patterns of tokens in the tokenized strings comprises: treating, by the device, the entity tokens that appear in the strings as wildcards. 19 . The computer-readable medium as in claim 15 , wherein associating similar ones of the entity-centric contexts comprises: mapping, by the device, the entity-centric contexts to vectors in a vector space, wherein two similar entity-centric contexts are deemed similar to one another based on the distance between their respective vectors in the vector space. 20 . The computer-readable medium as in claim 19 , wherein mapping the entity-centric contexts to vectors in the vector space comprises: using, by the device, a trained neural network to map the entity-centric contexts to vectors in the vector space.

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • using logs of notifications; Post-processing of notifications · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

  • using context · CPC title

  • Indexing structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018285397A1 cover?
In one embodiment, a device in a network tokenizes a plurality of strings from unstructured log data into entity tokens and non-entity tokens. The entity tokens identify entities in the network. The device identifies patterns of tokens in the tokenized strings. The device determines entity-centric contexts from the identified patterns. A particular entity-centric context comprises a sequence of…
Who is the assignee on this patent?
Cisco Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2228. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).