Pipeline ranking with model-based dynamic data allocation

US12555029B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12555029-B2
Application numberUS-202117237379-A
CountryUS
Kind codeB2
Filing dateApr 22, 2021
Priority dateApr 22, 2021
Publication dateFeb 17, 2026
Grant dateFeb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a method for ranking machine learning (ML) pipelines for a dataset, a processor receives first performance curves predicted by a meta learner model for a plurality of ML pipelines. A processor allocates a first subset of data points from the dataset to each of the plurality of ML pipelines. A processor receives first performance scores for each of the ML pipelines for the first subset of data points. A processor updates the meta learner model using the first performance scores. A processor receives second performance curves from the meta learner model updated with the first performance scores. A processor ranks the plurality of ML pipelines based on the second performance curves.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for ranking machine learning (ML) pipelines for a dataset, the method comprising: receiving, by one or more processors, first performance curves made up of performance values predicted for dependent variables of data allocation size, wherein the data allocation size comprises a number of data points fed to each of a plurality of machine learning (ML) pipelines by a meta learner model, wherein the first performance curves are the predicted performance values with respect to the dependent variables for each of the ML pipelines; allocating a first subset of data points from the dataset to each of the plurality of ML pipelines selected based on changing points where each changing point is a point where each of the first performance curves cross another of the first performance curves; receiving first performance scores for each of the ML pipelines for the first subset of data points; generating, by one or more processors, a new meta learner model by backpropagating a difference in the performance value between the first performance curves at a particular dependent variable value and the first performance scores for the first subset of data points at the particular dependent variable value; receiving second performance curves from the new meta learner model; and ranking the plurality of ML pipelines based on the second performance curves. 2 . The method of claim 1 , comprising training the meta learner model using meta features of a training dataset. 3 . The method of claim 1 , comprising allocating a second subset of data points from the dataset to each of the plurality of ML pipelines. 4 . The method of claim 3 , wherein the second subset of data points are selected based on the changing points. 5 . The method of claim 1 , wherein the meta learner model is updated via backpropagation of a difference between first performance curves and the first performance scores at each point in the first subset of data points. 6 . The method of claim 1 , wherein the first performance curves comprise scores for a selection from the group consisting of accuracy, error, recall, memory consumption, CPU usage, and running time, at a range of data points. 7 . A computer program product for ranking machine learning (ML) pipelines for a dataset, comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to receive first performance curves made up of performance values predicted for dependent variables of data allocation size, wherein the data allocation size comprises a number of data points fed to each of a plurality of machine learning (ML) pipelines by a meta learner model, wherein the first performance curves are the predicted performance values with respect to the dependent variables for each of the ML pipelines; program instructions to allocate a first subset of data points from a dataset to each of the plurality of ML pipelines selected based on changing points where the each changing point is a point where each of the first performance curves cross another of the first performance curves; program instructions to receive first performance scores for each of the ML pipelines for the first subset of data points; program instructions to generate a new meta learner model by backpropagating a difference in the performance value between the first performance curves at a particular dependent variable value and the first performance scores for the data first subset of data points at the particular dependent variable value; program instructions to receive second performance curves from the new meta learner model; and program instructions to rank the plurality of ML pipelines based on the second performance curves. 8 . The computer program product of claim 7 , comprising program instructions to train the meta learner model using meta features of a training dataset. 9 . The computer program product of claim 7 , comprising program instructions to allocate a second subset of data points from the dataset to each of the plurality of ML pipelines. 10 . The computer program product of claim 9 , wherein the second subset of data points are selected based on changing points where the first performance curves cross. 11 . The computer program product of claim 7 , wherein the meta learner model is updated via backpropagation of a difference between first performance curves and the first performance scores at each point in the first subset of data points. 12 . The computer program product of claim 7 , wherein the first performance curves comprise scores for a selection from the group consisting of accuracy, error, recall, memory consumption, CPU usage, and running time. 13 . A computer system comprising: one or more computer processors, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to receive first performance curves made up of performance values predicted for dependent variables of data allocation size, wherein the data allocation size comprises a number of data points fed to each of a plurality of machine learning (ML) pipelines by a meta learner model, wherein the first performance curves are the predicted performance values with respect to the dependent variables for each of the ML pipelines; program instructions to allocate a first subset of data points from a dataset to each of the plurality of ML pipelines selected based on changing points where each changing point is a point where each of the first performance curves cross another of the first performance curves; program instructions to receive first performance scores for each of the ML pipelines for the first subset of data points; program instructions to generate a new meta learner model by backpropagating a difference in the performance value between the first performance curves at a particular dependent variable value and the first performance scores for the first subset of data points at the particular dependent variable value; program instructions to receive second performance curves from the new meta learner model; and program instructions to rank the plurality of ML pipelines based on the second performance curves. 14 . The system of claim 13 , comprising program instructions to train the meta learner model using meta features of a training dataset. 15 . The system of claim 13 , comprising program instructions to allocate a second subset of data points from the dataset to each of the plurality of ML pipelines. 16 . The system of claim 15 , wherein the second subset of data points are selected based on the changing points where the first performance curves cross. 17 . The system of claim 13 , wherein the meta learner model is updated via backpropagation of a difference between first performance curves and the first performance scores at each point in the first subset of data points. 18 . The system of claim 13 , wherein the first performance curves comprise scores for performance values selected from the group consisting of accuracy, error, recall, memory consumption, CPU usage, and running time.

Assignees

Inventors

Classifications

  • Performance evaluation by modeling · CPC title

  • where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title

  • Inference or reasoning models · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12555029B2 cover?
In a method for ranking machine learning (ML) pipelines for a dataset, a processor receives first performance curves predicted by a meta learner model for a plurality of ML pipelines. A processor allocates a first subset of data points from the dataset to each of the plurality of ML pipelines. A processor receives first performance scores for each of the ML pipelines for the first subset of dat…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).