Pattern-recognition enabled autonomous configuration optimization for data centers
US-2021263828-A1 · Aug 26, 2021 · US
US11941541B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11941541-B2 |
| Application number | US-202016988809-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 10, 2020 |
| Priority date | Aug 10, 2020 |
| Publication date | Mar 26, 2024 |
| Grant date | Mar 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: obtaining a performance matrix representing accuracies obtained by executing a plurality of machine-learning pipelines on a plurality of training data sets, wherein a machine-learning pipeline comprises a series of operations performed on a data set; selecting a defined number of top machine-learning pipelines as potential machine-learning pipelines for a testing data set based, at least in part, on computing a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; determining a pipeline accuracy for each of the potential machine-learning pipelines when executed against the testing data set; providing a recommended machine-learning pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential machine-learning pipeline; and performing initialization for pipeline recommendation prior to the computing of similarities between the testing data set and each of the plurality of training data sets represented in the performance matrix, the initialization comprising: obtaining data set metafeatures for each of the plurality of training data sets in the performance matrix; obtaining data set metafeatures for the testing data set; determining a plurality of similar training data sets that are similar to the testing data set based in part on the data set metafeatures for each of the plurality of training data sets and the data set metafeatures for the testing data set; selecting a defined number of most similar training data sets; selecting one or more columns from the performance matrix that correspond to the selected most similar training data sets; determining an initialization stage average rank for each machine-learning pipeline based on the selected columns of the performance matrix; selecting a plurality of top machine-learning pipelines as initialization stage machine-learning pipelines based on the initialization stage average rank for each machine-learning pipeline; storing initialization stage results from executing each of the initialization stage machine-learning pipelines as an initialization data set; and providing the initialization data set as the testing data set; storing results from executing each of the potential machine-learning pipelines as a new data set. 2. The computer-implemented method of claim 1 wherein selecting the defined number of top machine-learning pipelines as potential machine-learning pipelines for the testing data set further comprises: selecting a defined number of columns of the performance matrix based on the similarity between the testing data set and each of the plurality of training data sets; and selecting the defined number of top machine-learning pipelines as potential machine-learning pipelines for the testing data set based on the selected columns of the performance matrix. 3. The computer-implemented method of claim 2 further comprising: determining an average rank for each machine-learning pipeline based on the selected columns of the performance matrix and the accuracies represented in the performance matrix; and selecting the defined number of top machine-learning pipelines based on the average rank for each machine-learning pipeline. 4. The computer-implemented method of claim 2 , wherein when computing the similarity between the testing data set and each of the plurality of training data sets, only entries that are present in both the training data set and the testing data set being compared for similarity are used. 5. The computer-implemented method of claim 1 further comprising: determining a similarity-weighted mean and variance of pipeline accuracy for each machine-learning pipeline; and selecting the defined number of top machine-learning pipelines based on an expected, improvement criteria associated with the mean and variance. 6. The computer-implemented method of claim 1 , wherein the data set metafeatures for each of the plurality of training data sets and the metafeatures for the testing data set comprise: a number of missing values in a data set; a number of categorical features; a number of real-valued features; and quantile distributions of the data set or individual features. 7. The computer-implemented method of claim 1 , wherein each (i, j) entry in the performance matrix represents an accuracy obtained by executing a pipeline i and a training data set j. 8. The computer-implemented method of claim 1 further comprising performing an iterative series of pipeline accuracy determinations, for a defined number of iterations, in response to the determining of the pipeline accuracy for each of the potential machine-learning pipelines when executed against the testing data set and prior to providing the recommended machine-learning pipeline, the performing of the iterative series of pipeline accuracy determinations comprising: computing a similarity between the stored results from executing each of the potential machine-learning pipelines and each of the plurality of training data sets represented in the performance matrix; selecting a defined number of columns of the performance matrix based on the similarity between the stored results and each of the plurality of training data sets; determining an average rank for each machine-learning pipeline based on the selected columns of the performance matrix; selecting a defined number of top-ranked machine-learning pipelines that are unexecuted as potential machine-learning pipelines; storing iteration result sets from executing each of the potential machine-learning pipelines; determining the pipeline accuracy for each of the potential machine-learning pipelines when executed against the stored iteration result sets; and determining a new recommended machine-learning pipeline based on the pipeline accuracy for each potential machine-learning pipeline. 9. A computer program product comprising a computer readable storage medium having stored thereon: program instructions programmed to obtain a performance matrix representing accuracies obtained by executing a plurality of machine-learning pipelines on a plurality of training data sets, wherein a machine-learning pipeline comprises a series of operations performed on a data set; program instructions programmed to select a defined number of top machine-learning pipelines as potential machine-learning pipelines for a testing data set based on computing a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; program instructions programmed to store results from executing each of the potential machine-learning pipelines as a new data set; program instructions programmed to determine a pipeline accuracy for each of the potential machine-learning pipelines when executed against the testing data set; program instructions programmed to provide a recommended machine-learning pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential machine-learning pipeline; and program instructions programmed to perform initialization for pipeline recommendation prior to the computing of similarities between the testing data set and each of the plurality of training data sets represented in the performance matrix, the initialization comprising: obtaining data set metafeatures for each of the plurality of training data sets in the performance matrix; obtaining data set metafeatures for the testing data set; determining a plurality of similar training data sets that are similar to the testing data set based in part on the data set metafeatures for each of
Inference or reasoning models · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title
based on specific statistical tests · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.