Debugging and profiling of machine learning model training
US-2021097431-A1 · Apr 1, 2021 · US
US12293262B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-12293262-B1 |
| Application number | US-201916585266-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 27, 2019 |
| Priority date | Sep 27, 2019 |
| Publication date | May 6, 2025 |
| Grant date | May 6, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for adaptive machine learning training via in-flight feature modification are described. A training monitor captures training data during the training of a machine learning model, and a metric generator creates metrics such as feature importance metrics based on the data. A rule evaluation engine determines whether any modification conditions are met for any of the features based on the metrics, and based on such a determination can cause the in-flight training job to be modified.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, at an endpoint of a multi-tenant provider network, one or more messages indicating a request to train a machine learning (ML) model; initiating, by a model training system including memory storing model training system instructions and one or more processors for executing the model training system instructions, at least one training instance to train a machine learning (ML) model within a provider network using an iterative training process; obtaining, by a training monitor of the model training system during the iterative training process of the ML model by the at least one training instance, training data including resource utilization data associated with the iterative training process and model state data including current weights associated with portions of the ML model at a current iteration in the iterative training process; generating, by a ML analysis system separate from the model training system and based on the training data, including the resource utilization data associated with the iterative training process, and including the model state data including the current weights associated with portions of the ML model at the current iteration in the iterative training process, a plurality of feature importance metric values, each indicating a relative importance of a corresponding feature within the iterative training process, wherein the ML analysis system includes memory storing ML analysis system instructions and one or more processors for executing the ML analysis system instructions; determining, by the ML analysis system based at least in part on a first of the plurality of feature importance metric values that corresponds to a first feature of the plurality of features, that a modification condition is satisfied; causing the at least one training instance to modify a utilization or importance of the first feature in at least a subsequent iteration of the iterative training process, the modification affecting a numeric convergence of the training; and storing, at a conclusion of the iterative training process, one or more model artifacts for the ML model at a location of a storage service of the provider network. 2. The computer-implemented method of claim 1 , wherein the causing the training instance to modify the utilization or importance of the first feature comprises eliminating the utilization of the first feature by setting all values of the first feature to a same value. 3. The computer-implemented method of claim 1 , wherein the causing the training instance to modify the utilization or importance of the first feature comprises decreasing the importance of the first feature by decreasing a sampling rate corresponding to the first feature. 4. A computer-implemented method comprising: initiating, by a model training system including memory storing model training system instructions and one or more processors for executing the model training system instructions, at least one training instance to train a machine learning (ML) model within a provider network using an iterative training process; obtaining, by a training monitor of the model training system during the iterative training process of the ML model by the at least one training instance, training data including resource utilization data associated with the iterative training process and model state data including current weights associated with portions of the ML model at a current iteration in the iterative training process; generating, by a ML analysis system separate from the model training system and based on the training data, including the resource utilization data associated with the iterative training process, and including the model state data including the current weights associated with portions of the ML model at the current iteration in the iterative training process, a plurality of metric values corresponding to a plurality of features used in the iterative training process, wherein the ML analysis system includes memory storing ML analysis system instructions and one or more processors for executing the ML analysis system instructions; determining, by the ML analysis system based at least in part on a first of the plurality of metric values that corresponds to a first feature of the plurality of features, that a modification condition is satisfied; and causing the at least one training instance to modify a utilization or importance of the first feature in a subsequent iteration in the iterative training process. 5. The computer-implemented method of claim 4 , wherein: determining that the modification condition is satisfied comprises determining that the first metric value is less than a first threshold amount; and causing the training instance to modify the utilization or importance of the first feature includes causing the training instance to no longer utilize original values of the first feature in the iterative training process. 6. The computer-implemented method of claim 5 , wherein causing the training instance to no longer utilize the first feature in the iterative training process includes setting all values of the first feature in a training dataset to be a same value. 7. The computer-implemented method of claim 4 , wherein: determining that the modification condition is satisfied comprises determining that the first metric value is greater than a second threshold amount; and causing the training instance to modify the utilization or importance of the first feature includes causing the training instance to reduce the importance of the first feature in the iterative training process. 8. The computer-implemented method of claim 7 , wherein causing the training instance to reduce the importance of the first feature includes modifying a sampling rate associated with values of the first feature. 9. The computer-implemented method of claim 4 , further comprising: receiving a message originated by a computing device of a user that indicates a value; and setting the modification condition based at least in part on the value. 10. The computer-implemented method of claim 4 , wherein each of the plurality of metric values indicates a relative importance of the corresponding feature within the iterative training process. 11. The computer-implemented method of claim 4 , further comprising: transmitting a message to a computing device of a user, the message indicating a set of features to be used as input for the ML model, wherein the set of features does not include the first feature. 12. The computer-implemented method of claim 4 , wherein causing the training instance to modify the utilization or importance of the first feature comprises one or more of: suppressing the first feature in the iterative training process, reducing or increasing an importance of the first feature in the iterative training process, modifying values in a training dataset of the first feature for the iterative training process, or modifying one or more training parameters associated with the iterative training process. 13. The computer-implemented method of claim 12 , further comprising: deploying the ML model within the provider network behind an endpoint; receiving a request to perform inference using the ML model that was originated by a client that was directed to the endpoint; generating an inference using the ML model; and transmitting the inference to the client. 14. The computer-implemented method of claim 4 , wherein causing the training instance to modify the utilization or importance of the first feature in the iterative training process comprises: modifying at
Machine learning · CPC title
File meta data generation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.