Machine-learning data analysis tool

US10607150B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10607150-B2
Application numberUS-201615050785-A
CountryUS
Kind codeB2
Filing dateFeb 23, 2016
Priority dateFeb 23, 2016
Publication dateMar 31, 2020
Grant dateMar 31, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein is a computer-implemented tool that facilitates data analysis by use of machine learning (ML) techniques. The tool cooperates with a data intake and query system and provides a graphical user interface (GUI) that enables a user to train and apply a variety of different ML models on user-selected datasets of stored machine data. The tool can provide active guidance to the user, to help the user choose data analysis paths that are likely to produce useful results and to avoid data analysis paths that are less likely to produce useful results.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: generating, by a computer system, a graphical user interface that enables a user of a processing device to: select a machine learning (ML) model for training, from a plurality of selectable ML models, specify a first dataset of timestamped machine data events based on which the selected ML model is to be trained; invoke training of the selected ML model based on the first dataset and view a trained ML model as a result of the training, and invoke application of the trained ML model to a second dataset of timestamped machine data events and view a result of the application of the trained ML model to the second dataset; dynamically generating user guidance on potential analysis paths for the user to take, based on the first dataset, wherein the user guidance on potential ML analysis paths for the user to take comprises a suggested data field of the first dataset upon which to base training of the ML model; and a suggested type of ML analysis to apply; and causing the user guidance to be output to the user via the graphical user interface. 2. The method of claim 1 , further comprising: receiving user inputs from the user via the graphical user interface; and in response to the user inputs, executing a process that includes at least one of: training the selected ML model on the first dataset based on the user inputs, displaying the result of the training, or applying the selected ML model to the second user-identified set of data based on the user inputs; and outputting to the user a result of the process, via the graphical user interface. 3. The method of claim 1 , wherein generating the graphical user interface comprises providing a plurality of user input fields to receive user inputs that specify particular data fields of the first dataset, for training the selected ML model. 4. The method of claim 1 , wherein generating the graphical user interface comprises providing a plurality of user input fields to receive user inputs that specify particular data fields for applying the selected ML model. 5. The method of claim 1 , wherein generating the graphical user interface comprises: providing a user input field for receiving user input that specifies a particular data field of the first dataset, based on which the selected ML model is to be trained; identifying a subset of data fields in the first dataset as satisfying a predetermined criterion for relevance to the selected ML model; and suggesting to the user the identified subset of the data fields of the first dataset, for possible selection by the user in the user input field. 6. The method of claim 1 , wherein generating the graphical user interface comprises: providing a user input field for receiving user input that specifies at least one of: a particular data field of the first dataset, based on which the selected ML model is to be trained; or a particular data field of a second dataset, for applying the selected ML model; and suggesting to the user a subset of data fields of the first dataset or a subset of the data fields of the second dataset, for possible selection by the user in the user input fields. 7. The method of claim 1 , wherein the first dataset comprises a plurality of data fields, the method further comprising: identifying a subset of the data fields in the first dataset as satisfying a predetermined criterion for relevance to the selected ML model. 8. The method of claim 1 , wherein the first dataset comprises a plurality of data fields, the method further comprising: identifying a subset of the data fields in the first dataset as satisfying a predetermined criterion for relevance to the selected ML model; wherein generating the graphical user interface comprises: providing a plurality of user input fields to receive user input specifying fields to use in training the ML model; and indicating to the user the subset of the data fields in the first dataset identified as satisfying the predetermined criterion for relevance, in relation to at least one of the user input fields. 9. The method of claim 1 , wherein the first dataset comprises a plurality of data fields, the method further comprising: identifying a subset of the data fields in the first dataset as satisfying a predetermined criterion for relevance to the selected ML model; receiving user inputs at the plurality of user input fields, the user inputs indicating user selections of data fields in the first dataset that are to be used to train the ML model; and training the ML on the first dataset based on the user selections. 10. The method of claim 1 , wherein the first dataset comprises a plurality of data fields, the method further comprising: identifying a subset of the data fields in the first dataset as satisfying a predetermined criterion for relevance to the selected ML model; and wherein identifying the subset of the data fields in the first dataset as satisfying the predetermined criterion for relevance to the selected ML model comprises executing a heuristic or a statistical method on the data fields in the first dataset to identify the subset of data fields. 11. The method of claim 1 , wherein the first dataset comprises a plurality of data fields, the method further comprising: identifying a subset of the data fields in the first dataset as satisfying a predetermined criterion for relevance to the selected ML model, by: identifying a type of data upon which the selected ML model is designed to operate; and identifying fields in the first dataset that are of said type, as the subset of the data fields in the first dataset. 12. The method of claim 1 , wherein the first dataset includes a plurality of existing data fields, each having one or more corresponding data values; the method further comprising: applying a transformation to data values of a plurality of data fields of the first dataset, to produce new data values; training the selected ML model based on the transformed data values; and identifying a data field to suggest to the user, for possible selection by the user, based on a result of the training the selected ML model in relation to a predetermined quality criterion. 13. The method of claim 1 , wherein the first dataset includes a plurality of existing data fields, each having one or more corresponding data values; the method further comprising: applying a transformation to the data values of at least one of the existing data fields of the first dataset, to produce a plurality of new data values associated with at least one new data field; training the selected ML model based on the new data values; and identifying one of the existing data fields, for possible selection by the user, based on a result of the training the selected ML model; wherein providing the user guidance comprises indicating the identified existing data field to the user as the suggested data field. 14. The method of claim 1 , wherein the first dataset includes a plurality of existing data fields, each having one or more corresponding data values; the method further comprising: applying a transformation to the data values of at least one of the existing data fields of the first dataset, to produce a plurality of new data values associated with at least one new data field; training the selected ML model based on the new data values; and identifying one of the new data fields to suggest to the user, for possible selection by the user, based on a result of the training the selected ML model; wherein providing the user guidance comprises indicating the identified new data field to the user as the s

Assignees

Inventors

Classifications

  • using machine learning or artificial intelligence · CPC title

  • Extracting rules from data · CPC title

  • comprising specially adapted graphical user interfaces [GUI] · CPC title

  • Network management software packages · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10607150B2 cover?
Disclosed herein is a computer-implemented tool that facilitates data analysis by use of machine learning (ML) techniques. The tool cooperates with a data intake and query system and provides a graphical user interface (GUI) that enables a user to train and apply a variety of different ML models on user-selected datasets of stored machine data. The tool can provide active guidance to the user, …
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).