Identification of relevant data events by use of clustering

US11314733B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11314733-B2
Application numberUS-201916263572-A
CountryUS
Kind codeB2
Filing dateJan 31, 2019
Priority dateJul 31, 2014
Publication dateApr 26, 2022
Grant dateApr 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing device performs a preliminary grouping of data items in a dataset to define one or more clusters and for each cluster, identifies a set of search terms for a search query that would retrieve data items in the cluster upon execution of the search query against the dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: (a) receiving, by a computer system, first user input that specifies criteria for a first search query; and (b) in response to the first user input that specifies criteria for the first search query, by the computer system, (b)(1) executing the first search query by accessing events in a data store to obtain a search result dataset, the search result dataset including a plurality of events, each event in the search result dataset being associated with a time stamp and containing raw machine-generated data indicative of performance or operation of a component in an information-technology environment, wherein the raw machine-generated data contained in each event includes a plurality of strings associated with a corresponding time stamp, and wherein each string includes text, numbers, or a combination of text and numbers; (b)(2) applying a clustering algorithm to the accessed events to form a cluster of events, wherein the cluster includes fewer than all of the events in the search result dataset; (b)(3) after formation of the cluster, creating, based on contents of the cluster, a second search query including a set of one or more search terms, wherein the second search query is not specified by user input, and wherein the second search query is designed to retrieve at least one of the events of the cluster, and associating the second search query with the cluster; and (b)(4) causing a display of information about the cluster, including an identification of the cluster and the second search query, the information about the cluster being selectable by a user to cause execution of the second search query to identify events of the cluster and one or more additional events that are not part of the cluster. 2. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event. 3. The method of claim 1 , wherein defining the set of search terms for the second search query for the cluster comprises determining search terms that, when applied to the data store, produce a set of events that includes each of the events in the cluster. 4. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event, and wherein an ordering of the keywords in the ordered list of keywords for any particular event is based on positions of the keywords within the particular event. 5. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event, the method further comprising: grouping events into the cluster when their respective ordered lists of keywords meet a similarity threshold. 6. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event, the method further comprising: grouping events into the cluster when their respective ordered lists of keywords meet a similarity threshold, wherein an ordering of the keywords in the ordered list of keywords for any particular event is based on positions of the keywords within the particular event. 7. The method of claim 1 , further comprising: (c) receiving second user input for selecting the cluster, the second user input being responsive to display of the identification of the cluster; and (d) in response to the second user input for selecting the cluster, (d)(1) executing the second search query against the data store to retrieve stored events that satisfy a criterion for similarity to the cluster; and (d)(2) causing display, to the user, of a result of the second search query, including causing display of an event that satisfies the second search query. 8. The method of claim 1 , wherein execution of the second search query against the accessed events includes evaluation of the search terms against the raw machine-generated data in the accessed events. 9. The method of claim 1 , wherein the data store is a field-searchable data store. 10. The method of claim 1 , wherein each of the search terms requires at least one of: a presence of a particular keyword in the events, an absence of a particular keyword in the events, or meeting a criterion pertaining to a field in the events. 11. The method of claim 1 , wherein creating the second search query comprises: testing alternative combinations of search terms to discover one combination that better reproduces the events in the cluster than another combination when applied to the field-searchable data store. 12. The method of claim 1 , further comprising saving the second search query as an event type corresponding to the cluster. 13. The method of claim 1 , further comprising: saving the second search query as an event type that includes a reference name for the event type; executing the second search query defining the event type; and tagging events retrieved by the search query with a tag corresponding to the reference name. 14. The method of claim 1 , further comprising: saving the second search query as an event type that includes a reference name for the event type; determining that a particular event that has been displayed to a user satisfies criteria of the second search query; and displaying the reference name for the event type in association with information about the particular event. 15. The method of claim 1 , wherein applying the clustering algorithm to the events includes identifying one or more tokens in each event, the tokens comprising keywords, and wherein the clustering algorithm includes generating a token vector for each of the events, each token vector including tokens for an event; and grouping events having token vectors that have a similarity within a similarity threshold into the cluster. 16. The method of claim 1 , wherein identifying the set of search terms comprises identifying one or more tokens included in the events in the cluster, the tokens comprising keywords. 17. The method of claim 1 , wherein identifying the set of search terms comprises identifying each of the events that contains a particular token. 18. The method of claim 1 , wherein identifying the set of search terms comprises determining a percentage of events that include a given token in each of the one or more clusters. 19. The method of claim 1 , wherein identifying the set of search terms comprises: determining a percentage of events that include a given token in each of the one or more clusters; and averaging the determined percentages for each of the one or more clusters. 20. The method of claim 1 , wherein identifying the set of search terms comprises determining a variance, across each of the one or more clusters, in a percentage of events in the one or more clusters that include a given token. 21. The method of claim 1 , wherein applying the clustering algorithm to the events includes identifying one or more tokens in each event, the tokens comprising keywords, and wherein identifying the set of search terms comprises calculating a relevance score for each token in each of the one or more clusters, wherein the relevance score is based at least in part on a percentage of events in each of the one or more clusters that include a given token, an average of the percentages for each of the one or more clusters, and a variance in the percentages. 22. The method of claim 1 , wherein applying the clustering algorithm

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11314733B2 cover?
A processing device performs a preliminary grouping of data items in a dataset to define one or more clusters and for each cluster, identifies a set of search terms for a search query that would retrieve data items in the cluster upon execution of the search query against the dataset.
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/242. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).