Automated dataset drift detection
US-2023139718-A1 · May 4, 2023 · US
US12282384B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12282384-B2 |
| Application number | US-202318028495-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 19, 2023 |
| Priority date | Jan 19, 2022 |
| Publication date | Apr 22, 2025 |
| Grant date | Apr 22, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Present disclosure relates to management of artificial intelligence systems by identifying root cause of reduced performance and/or failure in computing systems, and particularly relates to systems and methods for detecting a drift in supervised and unsupervised machine learning (ML) models. The system retrieves current dataset corresponding to output of supervised ML models and unsupervised ML models. Further, the system segregates the current dataset based on requirement of a drift detection model and applies a plurality of drift detection models to the segregated dataset to generate predictive results corresponding to the current dataset. Furthermore, the system determines errors in predictive results by comparing predictive results to reference values associated with current dataset. Additionally, the system detects the drift in supervised ML models and unsupervised ML models based on determined errors being above a threshold value. The supervised ML models and unsupervised ML models are corrected based on detected drift.
Opening claim text (preview).
We claim: 1. A system for detecting a drift in supervised Machine Learning (ML) models and unsupervised ML model, the system comprising: a processor; and a memory coupled to the processor, wherein the memory comprises processor-executable instruction, which on execution, cause the processor to: retrieve current dataset corresponding to an output of one or more supervised ML models and one or more unsupervised ML models; segregate the current dataset based on a requirement of at least one drift detection model of a plurality of drift detection models; apply the at least one drift detection model of the plurality of drift detection models to the segregated dataset to generate one or more predictive results corresponding to the current dataset; calculate a sliding window probability of the current dataset; track maximum probability values in the current dataset based on the calculated sliding window probability; determine one or more correct prediction results from the maximum probability values; detect the drift in the one or more supervised ML models and the one or more unsupervised ML models based on the one or more correct prediction results being below a pre-defined maximum probability value and a pre-defined probability threshold value; determine one or more errors in the one or more predictive results by comparing the one or more predictive results to one or more reference values associated with the current dataset; and detect the drift in the one or more supervised ML models and the one or more unsupervised ML models based on the determined one or more errors being greater than a threshold value, wherein the one or more supervised ML models and the one or more unsupervised ML models are corrected based on the detected drift. 2. The system as claimed in claim 1 , wherein, to apply the at least one drift detection model of the plurality of drift detection models, the processor is configured to: train the plurality of drift detection models using a historical dataset and the current dataset; increment a counter when the one or more errors are determined in the one or more predictive results from the plurality of drift detection models trained using the historical dataset; decrement the counter when the one or more errors are determined in the one or more predictive results from the plurality of drift detection models trained using the current dataset; and detect the drift in the one or more supervised ML models and the one or more unsupervised ML models based on a count in the counter being greater than a predefined count threshold value. 3. The system as claimed in claim 1 , wherein, to apply the at least one drift detection model of the plurality of drift detection models, the processor is configured to: segregate the current dataset into a training dataset and a test dataset; train the plurality of drift detection models using the training dataset; generate the one or more predictive results using the test dataset; calculate a first error rate from one or more errors in the generated one or more predictive results corresponding to the test dataset; shuffle the current dataset and segregate the shuffled dataset into training dataset and test dataset in response to calculating the first error rate; train the plurality of drift detection models using the training dataset associated with the shuffled dataset; generate the one or more predictive results using the test dataset associated with the shuffled dataset; calculate a second error rate from one or more errors in the generated one or more predictive results corresponding to the test dataset associated with the shuffled dataset; and detect the drift in the one or more supervised ML models and the one or more unsupervised ML models based on a difference between the first error rate and the second error rate being greater than a pre-defined error rate threshold value. 4. The system as claimed in claim 1 , wherein, to apply the at least one drift detection model of the plurality of drift detection models, the processor ( 202 ) is configured to: segregate the current dataset into a first dataset and a second dataset, and segregate each of the first dataset and the second dataset into first partition data and second partition data; calculate a kernel-based distribution from the first partition data; calculate a log probability between the second partition data of the first dataset and the second dataset; and determine a difference in the kernel-based distribution from the calculated log probability. 5. The system as claimed in claim 1 , wherein, to apply the at least one drift detection model of the plurality of drift detection models, the processor is configured to: create an artificial label; assign a ‘1’ value to a first dataset of the current dataset and ‘−1’ value to a second dataset of the current dataset; classify the first dataset and the second dataset using a binary classifier built on the current dataset with k fold cross validation, based on the assigned value; determine an accuracy score for the classification of the first dataset and the second dataset using the binary classifier, and detect the drift in the one or more supervised ML models and the one or more unsupervised ML models based on the accuracy score being greater than a pre-defined accuracy threshold value. 6. The system as claimed in claim 1 , wherein the current dataset is segregated to determine a data point in the current dataset based on detecting the drift in the one or more supervised ML models and the one or more unsupervised ML models. 7. The system as claimed in claim 1 , wherein the detected drift is a statistic indicative of the drift in the one or more supervised ML models and the one or more unsupervised ML models. 8. The system as claimed in claim 1 , wherein the plurality of drift detection models for detecting the drift in the one or more supervised ML models comprises at least one of a Fast Hoeffding Drift Detection Method (FHDDM), a Paired Learner (PL), and a Shuffling and Resampling (SR). 9. The system as claimed in claim 1 , wherein the plurality of drift detection models for detecting the drift in the one or more unsupervised ML models comprises at least one of a Kullback Leibler (KL) Divergence, a Kolmogorov Smirnov Test (KS), a Cramer Von Mises Test (CVM), an Anderson Darling Test (AD), a Kernel Based Distribution Discrepancy Test (KBDD), and a Virtual Classifier (VC). 10. A method for detecting a drift in supervised and unsupervised Machine Learning (ML) models, the method comprising: retrieving, by a processor associated with a system, current dataset corresponding to an output of one or more supervised ML models and one or more unsupervised ML models; segregating, by the processor, the current dataset based on a requirement of at least one drift detection model of a plurality of drift detection models; applying, by the processor, the at least one drift detection model of the plurality of drift detection models to the segregated dataset to generate one or more predictive results corresponding to the current dataset; calculating, by the processor, a sliding window probability of the current dataset; tracking, by the processor, maximum probability values in the current dataset based on the calculated sliding window probability; determining, by the processor, one or more correct prediction results from the maximum probability values: detecting, by the processor, the drift in the one or more supervised ML models and the one or more unsupervised ML models based on the one or more correct prediction results being below a pre-defined maximum probability value and a pre-defined probability threshold value; determining, by t
Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title
Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.