What technology area does this patent fall under?

Primary CPC classification G06F16/242. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Identification of relevant data events by use of clustering

US11314733B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11314733-B2
Application number	US-201916263572-A
Country	US
Kind code	B2
Filing date	Jan 31, 2019
Priority date	Jul 31, 2014
Publication date	Apr 26, 2022
Grant date	Apr 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing device performs a preliminary grouping of data items in a dataset to define one or more clusters and for each cluster, identifies a set of search terms for a search query that would retrieve data items in the cluster upon execution of the search query against the dataset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: (a) receiving, by a computer system, first user input that specifies criteria for a first search query; and (b) in response to the first user input that specifies criteria for the first search query, by the computer system, (b)(1) executing the first search query by accessing events in a data store to obtain a search result dataset, the search result dataset including a plurality of events, each event in the search result dataset being associated with a time stamp and containing raw machine-generated data indicative of performance or operation of a component in an information-technology environment, wherein the raw machine-generated data contained in each event includes a plurality of strings associated with a corresponding time stamp, and wherein each string includes text, numbers, or a combination of text and numbers; (b)(2) applying a clustering algorithm to the accessed events to form a cluster of events, wherein the cluster includes fewer than all of the events in the search result dataset; (b)(3) after formation of the cluster, creating, based on contents of the cluster, a second search query including a set of one or more search terms, wherein the second search query is not specified by user input, and wherein the second search query is designed to retrieve at least one of the events of the cluster, and associating the second search query with the cluster; and (b)(4) causing a display of information about the cluster, including an identification of the cluster and the second search query, the information about the cluster being selectable by a user to cause execution of the second search query to identify events of the cluster and one or more additional events that are not part of the cluster. 2. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event. 3. The method of claim 1 , wherein defining the set of search terms for the second search query for the cluster comprises determining search terms that, when applied to the data store, produce a set of events that includes each of the events in the cluster. 4. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event, and wherein an ordering of the keywords in the ordered list of keywords for any particular event is based on positions of the keywords within the particular event. 5. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event, the method further comprising: grouping events into the cluster when their respective ordered lists of keywords meet a similarity threshold. 6. The method of claim 1 , wherein the clustering algorithm comprises generating, for each event, an ordered list of keywords contained in the event, the method further comprising: grouping events into the cluster when their respective ordered lists of keywords meet a similarity threshold, wherein an ordering of the keywords in the ordered list of keywords for any particular event is based on positions of the keywords within the particular event. 7. The method of claim 1 , further comprising: (c) receiving second user input for selecting the cluster, the second user input being responsive to display of the identification of the cluster; and (d) in response to the second user input for selecting the cluster, (d)(1) executing the second search query against the data store to retrieve stored events that satisfy a criterion for similarity to the cluster; and (d)(2) causing display, to the user, of a result of the second search query, including causing display of an event that satisfies the second search query. 8. The method of claim 1 , wherein execution of the second search query against the accessed events includes evaluation of the search terms against the raw machine-generated data in the accessed events. 9. The method of claim 1 , wherein the data store is a field-searchable data store. 10. The method of claim 1 , wherein each of the search terms requires at least one of: a presence of a particular keyword in the events, an absence of a particular keyword in the events, or meeting a criterion pertaining to a field in the events. 11. The method of claim 1 , wherein creating the second search query comprises: testing alternative combinations of search terms to discover one combination that better reproduces the events in the cluster than another combination when applied to the field-searchable data store. 12. The method of claim 1 , further comprising saving the second search query as an event type corresponding to the cluster. 13. The method of claim 1 , further comprising: saving the second search query as an event type that includes a reference name for the event type; executing the second search query defining the event type; and tagging events retrieved by the search query with a tag corresponding to the reference name. 14. The method of claim 1 , further comprising: saving the second search query as an event type that includes a reference name for the event type; determining that a particular event that has been displayed to a user satisfies criteria of the second search query; and displaying the reference name for the event type in association with information about the particular event. 15. The method of claim 1 , wherein applying the clustering algorithm to the events includes identifying one or more tokens in each event, the tokens comprising keywords, and wherein the clustering algorithm includes generating a token vector for each of the events, each token vector including tokens for an event; and grouping events having token vectors that have a similarity within a similarity threshold into the cluster. 16. The method of claim 1 , wherein identifying the set of search terms comprises identifying one or more tokens included in the events in the cluster, the tokens comprising keywords. 17. The method of claim 1 , wherein identifying the set of search terms comprises identifying each of the events that contains a particular token. 18. The method of claim 1 , wherein identifying the set of search terms comprises determining a percentage of events that include a given token in each of the one or more clusters. 19. The method of claim 1 , wherein identifying the set of search terms comprises: determining a percentage of events that include a given token in each of the one or more clusters; and averaging the determined percentages for each of the one or more clusters. 20. The method of claim 1 , wherein identifying the set of search terms comprises determining a variance, across each of the one or more clusters, in a percentage of events in the one or more clusters that include a given token. 21. The method of claim 1 , wherein applying the clustering algorithm to the events includes identifying one or more tokens in each event, the tokens comprising keywords, and wherein identifying the set of search terms comprises calculating a relevance score for each token in each of the one or more clusters, wherein the relevance score is based at least in part on a percentage of events in each of the one or more clusters that include a given token, an average of the percentages for each of the one or more clusters, and a variance in the percentages. 22. The method of claim 1 , wherein applying the clustering algorithm

Assignees

Splunk Inc

Inventors

Classifications

G06F16/285
Clustering or classification · CPC title
G06F16/242Primary
Query formulation · CPC title

Patent family

Related publications grouped by family.

View patent family 55180243

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11314733B2 cover?: A processing device performs a preliminary grouping of data items in a dataset to define one or more clusters and for each cluster, identifies a set of search terms for a search query that would retrieve data items in the cluster upon execution of the search query against the dataset.
Who is the assignee on this patent?: Splunk Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/242. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods and systems for searching logical patterns

Asynchronous processing of messages from multiple search peers

External malware data item clustering and analysis

Content-oriented federated object store

Distributed log collector and report generation

Frequently asked questions