Generating summaries of messages associated with assets in an enterprise system

US11025658B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11025658-B2
Application numberUS-201916402313-A
CountryUS
Kind codeB2
Filing dateMay 3, 2019
Priority dateMay 3, 2019
Publication dateJun 1, 2021
Grant dateJun 1, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes obtaining messages associated with assets in an enterprise system, splitting each of the messages into a set of tokens, determining a count of a number of occurrences of each of the tokens, and assigning weights to the tokens based at least in part on the counts of the number of occurrences of the tokens. The method also includes determining a score for each of the messages based at least in part on a combined sum of the weights for the set of tokens of that message, generating a summary of the messages by selecting a subset of the messages for based at least in part on the scores. The method further includes identifying remedial actions to be applied to assets in the enterprise system based at least in part on the summary of the messages, and implementing at least one of the identified remedial actions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a plurality of messages associated with one or more assets in an enterprise system, a given one of the assets comprising at least one of a physical computing resource and a virtual computing resource in the enterprise system; splitting each of the plurality of messages into a set of tokens; determining a count of a number of occurrences of each of the tokens in the plurality of messages; assigning weights to each of the tokens, the weight assigned to a given one of the tokens being based at least in part on the count of the number of occurrences of the given token in the plurality of messages; determining a score for each of the plurality of messages, the score for a given one of the plurality of messages being based at least in part on a combined sum of the weights for the set of tokens of the given message; generating a summary of the plurality of messages by selecting a subset of the plurality of messages for inclusion in the summary based at least in part on the scores for the plurality of messages; identifying one or more remedial actions to be applied to at least one of the assets in the enterprise system based at least in part on the summary of the plurality of messages; and implementing at least one of the identified remedial actions for the at least one asset in the enterprise system; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 2. The method of claim 1 wherein the plurality of messages comprises log messages obtained from the one or more assets in the enterprise system. 3. The method of claim 1 wherein the plurality of messages comprise representations of network sessions between pairs of assets in the enterprise system. 4. The method of claim 1 wherein splitting a given one of the plurality of messages comprises splitting the given message into a sequence of strings using one or more natural language processing delimiters, each string sequence corresponding to one of the set of tokens. 5. The method of claim 4 wherein splitting a given one of the plurality of messages comprises recognizing one or more designated special string sequences. 6. The method of claim 5 wherein the one or more designated special string sequences comprise at least one of: names of entities in the enterprise system; Internet Protocol (IP) addresses; uniform resource identifiers (URIs); and dates and times. 7. The method of claim 5 wherein determining the count of the number of occurrences of each of the tokens in the plurality of messages comprises, for a given one of the designated special string sequences, determining a count of the number of occurrences of all string sequences recognized as the given designated special string sequence. 8. The method of claim 4 further comprising defining semantic equivalence between two or more distinct string sequences, wherein determining the count of the number of occurrences of each of the tokens comprises maintaining a single count of the number of occurrences of each of the two or more distinct string sequences with defined semantic equivalence. 9. The method of claim 1 further comprising removing one or more of the set of tokens having a length less than a first designated threshold or a length greater than a second designated threshold from the set of tokens for a given one of the plurality of messages. 10. The method of claim 1 wherein assigning weights to each of the tokens utilizes at least one of log weight and entropy weight. 11. The method of claim 1 wherein assigning weights to each of the tokens comprises utilizing one or more user-defined weights to increase or decrease the weight assigned to one or more designated tokens. 12. The method of claim 1 wherein generating the summary of the plurality of messages comprises specifying a budget comprising a designated number of messages to include in the summary. 13. The method of claim 12 wherein selecting the subset of the plurality of messages for inclusion in the summary comprises: selecting, from a collection comprising at least a subset of the plurality of messages, a message with a highest score; removing the selected message from the collection; and repeating the selecting and removing until the specified budget is reached. 14. The method of claim 12 wherein selecting the subset of the plurality of messages for inclusion in the summary comprises: selecting, from a collection comprising at least a subset of the plurality of messages, a message with a highest score for tokens not yet present in messages selected for the summary; removing the selected message from the collection; and repeating the selecting and removing until the specified budget is reached. 15. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device: to obtain a plurality of messages associated with one or more assets in an enterprise system, a given one of the assets comprising at least one of a physical computing resource and a virtual computing resource in the enterprise system; to split each of the plurality of messages into a set of tokens; to determine a count of a number of occurrences of each of the tokens in the plurality of messages; to assign weights to each of the tokens, the weight assigned to a given one of the tokens being based at least in part on the count of the number of occurrences of the given token in the plurality of messages; to determine a score for each of the plurality of messages, the score for a given one of the plurality of messages being based at least in part on a combined sum of the weights for the set of tokens of the given message; to generate a summary of the plurality of messages by selecting a subset of the plurality of messages for inclusion in the summary based at least in part on the scores for the plurality of messages; to identify one or more remedial actions to be applied to at least one of the assets in the enterprise system based at least in part on the summary of the plurality of messages; and to implement at least one of the identified remedial actions for the at least one asset in the enterprise system. 16. The computer program product of claim 15 wherein generating the summary of the plurality of messages comprises specifying a budget comprising a designated number of messages to include in the summary, and wherein selecting the subset of the plurality of messages for inclusion in the summary comprises: selecting, from a collection comprising at least a subset of the plurality of messages, a message with a highest score; removing the selected message from the collection; and repeating the selecting and removing until the specified budget is reached. 17. The computer program product of claim 15 wherein generating the summary of the plurality of messages comprises specifying a budget comprising a designated number of messages to include in the summary, and wherein selecting the subset of the plurality of messages for inclusion in the summary comprises: selecting, from a collection comprising at least a subset of the plurality of messages, a message with a highest score for tokens not yet present in messages selected for the summary; removing the selected message from the collection; and repeating the selecting and removing until the specified budget is reached. 18. An app

Assignees

Inventors

Classifications

  • Countermeasures against malicious traffic (countermeasures against attacks on cryptographic mechanisms H04L9/002) · CPC title

  • involving event detection and direct action · CPC title

  • G06F40/295Primary

    Named entity recognition · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11025658B2 cover?
A method includes obtaining messages associated with assets in an enterprise system, splitting each of the messages into a set of tokens, determining a count of a number of occurrences of each of the tokens, and assigning weights to the tokens based at least in part on the counts of the number of occurrences of the tokens. The method also includes determining a score for each of the messages ba…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 01 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).