Advanced analytical infrastructure for machine learning

US2016358099A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016358099-A1
Application numberUS-201514730655-A
CountryUS
Kind codeA1
Filing dateJun 4, 2015
Priority dateJun 4, 2015
Publication dateDec 8, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Machine learning systems and computerized methods to compare candidate machine learning algorithms are disclosed. The machine learning system comprises a machine learning algorithm library, a data input module to receive a dataset and a selection of machine learning models derived from the machine learning algorithm library, an experiment module, and an aggregation module. The experiment module is configured to train and evaluate each machine learning model to produce a performance result for each machine learning model. The aggregation module is configured to aggregate the performance results for all of the machine learning models to form performance comparison statistics. Computerized methods include receiving a dataset, receiving a selection of machine learning models, training and evaluating each machine learning model to produce a performance result for each machine learning model, aggregating the performance results to form performance comparison statistics, and presenting the performance comparison statistics.

First claim

Opening claim text (preview).

1 . A machine learning system to compare candidate machine learning algorithms for a particular data analysis problem, the machine learning system comprising: a machine learning algorithm library that includes a plurality of machine learning algorithms configured to be tested with a common interface; a data input module configured to receive a dataset and a selection of machine learning models, wherein each machine learning model includes a machine learning algorithm from the machine learning algorithm library and one or more associated parameter values; an experiment module configured to train and evaluate each machine learning model to produce a performance result for each machine learning model; and an aggregation module configured to aggregate the performance results for all of the machine learning models to form performance comparison statistics. 2 . The machine learning system of claim 1 , wherein the common interface defines at least one of a common input, a common output, a common method for inputting data, a common method for outputting data, and a common procedure call for each machine learning algorithm of the machine learning algorithm library. 3 . The machine learning system of claim 1 , further comprising a data preprocessor configured to prepare the dataset for processing by the experiment module, wherein the data preprocessor is configured to at least one of discretize, apply independent component analysis to, apply principal component analysis to, eliminate missing data from, select features from, and extract features from the dataset. 4 . The machine learning system of claim 3 , wherein the data preprocessor is configured to extract a feature by at least determining a statistic of feature data during a time window, wherein the statistic includes at least one of a minimum, a maximum, an average, a variance, a deviation, a cumulative value, a rate of change, and an average rate of change. 5 . The machine learning system of claim 1 , further comprising a preprocessing algorithm library that includes a plurality of preprocessing algorithms and wherein the preprocessing algorithms conform to a common preprocessing interface. 6 . The machine learning system of claim 1 , wherein at least one machine learning model is a macro-procedure that combines outcomes of an ensemble of micro-procedures, wherein each micro-procedure includes a machine learning algorithm and one or more associated parameter values, wherein the macro-procedure is configured to combine the outcomes of the ensemble of micro-procedures by at least one of cumulative value, maximum value, minimum value, median value, average value, mode value, most common value, and majority vote. 7 . The machine learning system of claim 6 , wherein, for each macro-procedure, the experiment module is configured to generate a trained macro-procedure by independently training each micro-procedure to produce an ensemble of trained micro-procedures, and the experiment module is configured to evaluate the trained macro-procedure. 8 . The machine learning system of claim 1 , wherein the experiment module is configured to divide the dataset into a training dataset and an evaluation dataset, and wherein the training dataset and the evaluation dataset are complementary subsets of the dataset. 9 . The machine learning system of claim 8 , wherein the experiment module is configured to preprocess the training dataset to result in a preprocessing scheme and wherein the experiment module is configured to preprocess the evaluation dataset with the preprocessing scheme. 10 . The machine learning system of claim 1 , wherein the experiment module is configured to train each machine learning model with a training dataset that is a subset of the dataset to produce a trained model for each machine learning model, and wherein the experiment module is configured to evaluate each trained model with an evaluation dataset that is a subset of the dataset to produce the performance result for each machine learning model. 11 . The machine learning system of claim 1 , wherein the experiment module is configured to cross validate each machine learning model using at least one of leave-one-out cross validation and k-fold cross validation. 12 . The machine learning system of claim 1 , further comprising a presentation module configured to present the performance comparison statistics, wherein the presentation module is configured to present the performance results for all of the machine learning models in a unified format to facilitate comparison of the machine learning models. 13 . A computerized method for testing machine learning algorithms, the method comprising: receiving a dataset; receiving a selection of machine learning models, wherein each machine learning model includes a machine learning algorithm and one or more associated parameter values; training and evaluating each machine learning model to produce a performance result for each machine learning model; aggregating the performance results for all of the machine learning models to form performance comparison statistics; and presenting the performance comparison statistics. 14 . The method of claim 13 , wherein the dataset is a time-series dataset that includes a series of values of an observable measured in successive periods of time. 15 . The method of claim 13 , further comprising, before the training and evaluating, global preprocessing the dataset, and wherein the global preprocessing includes at least one of discretization, independent component analysis, principal component analysis, elimination of missing data, feature selection, and feature extraction. 16 . The method of claim 15 , wherein the global preprocessing includes extracting a feature by at least determining a statistic of feature data during a time window, and wherein the statistic includes at least one of a minimum, a maximum, an average, a variance, a deviation, a cumulative value, a rate of change, and an average rate of change. 17 . The method of claim 13 , wherein at least one machine learning model is a macro-procedure that combines outcomes of an ensemble of micro-procedures, wherein each micro-procedure includes a machine learning algorithm and one or more associated parameter values, and wherein the macro-procedure is configured to combine the outcomes of the ensemble of micro-procedures by at least one of cumulative value, maximum value, minimum value, median value, average value, mode value, most common value, and majority vote. 18 . The method of claim 13 , wherein the training and evaluating includes dividing the dataset into a training dataset and an evaluation dataset, and wherein the training dataset and the evaluation dataset are complementary subsets of the dataset, wherein the training and evaluating includes preprocessing the training dataset to generate a preprocessing scheme and wherein the training and evaluating includes preprocessing the evaluation dataset with the preprocessing scheme. 19 . The method of claim 13 , wherein the training and evaluating includes training each machine learning model with a training dataset that is a subset of the dataset to produce a trained model for each machine learning model, wherein the training and evaluating includes evaluating each trained model with an evaluation dataset that is a subset of the dataset to produce the performance result for each machine learning model, and wherein the evaluation dataset and the training dataset are complementary subsets of the dataset. 20 . The met

Assignees

Inventors

Classifications

  • G06N99/005Primary

    Physics · mapped topic

  • G06N20/00Primary

    Machine learning · CPC title

  • Distributed expert systems; Blackboards · CPC title

  • for performance assessment · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016358099A1 cover?
Machine learning systems and computerized methods to compare candidate machine learning algorithms are disclosed. The machine learning system comprises a machine learning algorithm library, a data input module to receive a dataset and a selection of machine learning models derived from the machine learning algorithm library, an experiment module, and an aggregation module. The experiment module…
Who is the assignee on this patent?
Boeing Co
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).