Systems and methods for improved active learning method for model development

US2025053860A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025053860-A1
Application numberUS-202318446460-A
CountryUS
Kind codeA1
Filing dateAug 8, 2023
Priority dateAug 8, 2023
Publication dateFeb 13, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems are described herein for minimizing resource expenditure during model training using user-defined constraints in sample selection. A system may obtain user-defined target parameter values for data labeling, a user input indicative of a value added per unit of model performance improvement, and a dataset (e.g., unlabeled samples). The system may select a first subset of the dataset and may transmit a request for labeling the samples. The system may receive a first training dataset comprising label data and the samples of the first subset. The system may train a machine learning model using the first training dataset and generate a margin curve. Based on the margin curve, the system may determine whether an amount of resource usage exceeds value added and responsive to determining that it does not exceed the amount of resource usage, select a second subset of the dataset.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for minimizing resource usage during model training using user-defined constraints in sample selection, the system comprising: a mobile device comprising one or more processors; and a non-transitory computer readable medium comprising instructions recorded thereon that when executed by the one or more processors causes operations comprising: obtaining (1) one or more user-defined target parameter values for data labeling, (2) a user input indicative of a value added per unit of model performance improvement, and (3) a dataset comprising a plurality of unlabeled samples; selecting, based on the one or more user-defined target parameter values, a first subset of the dataset, wherein the first subset comprises samples of the plurality of unlabeled samples; transmitting, to a remote device, a request for labeling the samples of the first subset, wherein the request comprises the samples of the first subset; receiving, from the remote device, a first training dataset based on the first subset, wherein the first training dataset comprises label data and the samples of the first subset, and wherein the label data indicates a classification for each sample; training, using the first training dataset, a machine learning model; generating, using the user input indicative of the value added per unit of model performance improvement, a margin curve of a relationship between resource usage and value added per unit of model performance improvement; determining, based on the margin curve, whether an amount of resource usage exceeds an amount of value added; responsive to determining that the amount of value added does not exceed the amount of resource usage, selecting a second subset of the dataset, wherein a number of samples of the second subset is determined based on the margin curve; transmitting, to the remote device, a second request for labeling samples of the second subset, wherein the second request comprises samples of the second subset; receiving, from the remote device, a second training dataset comprising label data and the samples of the second subset; and updating, using the second training dataset, the machine learning model. 2 . A method for minimizing resource expenditure during model training using user-defined constraints in sample selection, the method comprising: obtaining (1) one or more user-defined target parameter values for data labeling, (2) a user input indicative of a value added per unit of model performance improvement, and (3) a dataset comprising a plurality of unlabeled samples; selecting a first subset of the dataset, wherein the first subset comprises samples of the plurality of unlabeled samples; transmitting, to a remote device, a request for labeling the samples of the first subset, wherein the request comprises the samples of the first subset; receiving, from the remote device, a first training dataset based on the first subset, wherein the first training dataset comprises label data and the samples of the first subset, and wherein the label data indicates a classification for each sample; training, using the first training dataset, a machine learning model; generating, using the user input indicative of the value added per unit of model performance improvement, a margin curve of a relationship between resource usage and value added per unit of model performance improvement; determining, based on the margin curve, whether an amount of resource usage exceeds an amount of value added; and responsive to determining that the amount of value added does not exceed the amount of resource usage, selecting a second subset of the dataset, wherein a number of samples of the second subset is determined based on the margin curve. 3 . The method of claim 2 , further comprising: transmitting, to the remote device, a second request for labeling samples of the second subset, wherein the second request comprises samples of the second subset; receiving, from the remote device, a second training dataset based on the second subset, wherein the second training dataset comprises label data and the samples of the second subset, and wherein the label data indicates a classification for each sample; and updating, using the second training dataset, the machine learning model. 4 . The method of claim 3 , further comprising: responsive to determining that the amount of value added exceeds the amount of resource usage, determining completion of training for the machine learning model; and generating, for display on a user interface, a notification of completion of training of the machine learning model. 5 . The method of claim 3 , further comprising: responsive to determining that the amount of value added exceeds the amount of resource usage, determining completion of training for the machine learning model; generating one or more data files comprising parameters of the machine learning model in a standardized format; and transmitting, to a remote device, the one or more data files. 6 . The method of claim 3 , further comprising: responsive to determining that the amount of value added exceeds the amount of resource usage, determining completion of training for the machine learning model; receiving, from a remote device, one or more unseen samples; generating one or more classifications for the one or more unseen samples using the machine learning model; and transmitting, to a remote device, the one or more classifications. 7 . The method of claim 3 , wherein selecting the second subset of the dataset comprises: determining, based on unlabeled samples of the dataset, a measure of uncertainty corresponding to each sample, wherein the measure is indicative of a confidence of the machine learning model in classifying each sample; and identifying the samples of the plurality of unlabeled samples having a threshold measure of uncertainty. 8 . The method of claim 2 , wherein selecting a first subset of the dataset comprises selecting samples based on the one or more user-defined target parameter values. 9 . The method of claim 2 , wherein the one or more user-defined target parameter values correspond to target parameters indicative of a threshold for bias and variance for the machine learning model. 10 . The method of claim 2 , wherein the unit of model performance improvement comprises a unit of improvement in precision, recall, or F1-score, and wherein the user input indicative of a value added per unit of model performance improvement comprises a cost associated with labeling a sample of the plurality of unlabeled samples. 11 . The method of claim 2 , wherein obtaining the user input indicative of the value added per unit of model performance improvement further comprises: receiving a user selection of a user-defined valuation of the model performance improvement; and determining, based on the user-defined valuation of the model performance improvement, the value added per unit of model performance improvement. 12 . The method of claim 2 , wherein obtaining one or more user-defined target parameter values further comprises: receiving a user selection of a target number of samples to be labeled from the plurality of unlabeled samples; and determining, based on the target number of samples to be labeled from the plurality of unlabeled samples, a threshold number of samples for labeling. 13 . The method of claim 2 , wherein obtaining one or more user-defined target parameter values further comprises: receiving a user selection of an upper limit of total cost associated with labeling the plurality of unlabeled samples; and determining, based on the u

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025053860A1 cover?
Methods and systems are described herein for minimizing resource expenditure during model training using user-defined constraints in sample selection. A system may obtain user-defined target parameter values for data labeling, a user input indicative of a value added per unit of model performance improvement, and a dataset (e.g., unlabeled samples). The system may select a first subset of the d…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).