Post-modeling visualization

US2025053858A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025053858-A1
Application numberUS-202318446149-A
CountryUS
Kind codeA1
Filing dateAug 8, 2023
Priority dateAug 8, 2023
Publication dateFeb 13, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an approach, a processor selects a top N features for a machine learning (ML) model; discretizes values of each continuous feature of the top N features; generates a set of combination values that each represent a unique combination of feature values in for a data record; predicts, using the ML model, a target value for each record generating predicted target values; groups the predicted target values based on the combination value for each respective record; fits a distribution for each grouping of the predicted target values associated with a respective combination value generating a set of distributions; clusters and refits the set of distributions using a clustering algorithm resulting in a set of clusters and a refitted distribution for each cluster of the set of clusters; and outputs a visualization of the refitted distribution for each cluster as a distribution curve on a graph along with the associated records.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: selecting, by one or more processors, a top N features for a machine learning (ML) model trained on training data; discretizing, by the one or more processors, values of each continuous feature of the top N features into a set of categories; generating, by the one or more processors, a set of combination values that each represent a unique combination of feature values in a row representing a record within the training data; predicting, by the one or more processors, using the ML model, a target value for each record within the training data generating predicted target values; grouping, by the one or more processors, the predicted target values based on the combination value for each respective record of the training data; fitting, by the one or more processors, a distribution for each grouping of the predicted target values associated with a respective combination value generating a set of distributions and associated distribution curves; clustering and refitting, by the one or more processors, the set of distributions using a clustering algorithm to compress a number of distributions resulting in a set of clusters and a refitted distribution for each cluster of the set of clusters, wherein each refitted distribution is based on records associated with each distribution of the associated cluster; assigning, by the one or more processors, a different color to each feature of the top N features and a different shade of the respective different color for each category of the set of categories for a respective feature of the top N features; and outputting, by the one or more processors, a visualization of (1) the refitted distribution for each cluster as a distribution curve on a graph and (2) the associated records of the top N features as a table. 2 . The computer-implemented method of claim 1 , wherein selecting the top N features for the ML model comprises: computing, by the one or more processors, a feature importance for each feature of the ML model based on an association between changes in feature values and changes in an accuracy of the ML model; and determining, by the one or more processors, the top N features that contribute to a pre-set threshold accuracy percentage for the ML model based on the feature importance for each feature. 3 . The computer-implemented method of claim 1 , wherein discretizing values of each continuous feature comprises: applying, by the one or more processors, equal frequency binning to a set of values for a continuous feature generating a set of categorical values for the continuous feature. 4 . The computer-implemented method of claim 1 , further comprising: adding, by the one or more processors, a new column to the training data with respective combination values for each record. 5 . The computer-implemented method of claim 1 , further comprising: responsive to a user selecting a portion of data on one of the distribution curves, highlighting, by the one or more processors, corresponding records associated with the portion of data. 6 . The computer-implemented method of claim 1 , further comprising: calculating, by the one or more processors, a residual value for each record based on the corresponding predicted target value and an actual target value. 7 . The computer-implemented method of claim 1 , wherein fitting the distribution comprises: computing, by the one or more processors, a mean and a variance of the predicted target values for each combination value. 8 . A computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising: program instructions to select a top N features for a machine learning (ML) model trained on training data; program instructions to discretize values of each continuous feature of the top N features into a set of categories; program instructions to generate a set of combination values that each represent a unique combination of feature values in a row representing a record within the training data; program instructions to predict, using the ML model, a target value for each record within the training data generating predicted target values; program instructions to group the predicted target values based on the combination value for each respective record of the training data; program instructions to fit a distribution for each grouping of the predicted target values associated with a respective combination value generating a set of distributions and associated distribution curves; program instructions to cluster and refit the set of distributions using a clustering algorithm to compress a number of distributions resulting in a set of clusters and a refitted distribution for each cluster of the set of clusters, wherein each refitted distribution is based on records associated with each distribution of the associated cluster; program instructions to assign a different color to each feature of the top N features and a different shade of the respective different color for each category of the set of categories for each respective feature; and program instructions to output a visualization of (1) the refitted distribution for each cluster as a distribution curve on a graph and (2) the associated records of the top N features as a table. 9 . The computer program product of claim 8 , wherein the program instructions to select the top N features for the ML model comprise: program instructions to compute a feature importance for each feature of the ML model based on an association between changes in feature values and changes in an accuracy of the ML model; and program instructions to determine the top N features that contribute to a pre-set threshold accuracy percentage for the ML model based on the feature importance for each feature. 10 . The computer program product of claim 8 , wherein the program instructions to discretize values of each continuous feature comprise: program instructions to apply equal frequency binning to a set of values for a continuous feature generating a set of categorical values for the continuous feature. 11 . The computer program product of claim 8 , further comprising: program instructions to add a new column to the training data with respective combination values for each record. 12 . The computer program product of claim 8 , further comprising: responsive to a user selecting a portion of data on one of the distribution curves, program instructions to highlight corresponding records associated with the portion of data. 13 . The computer program product of claim 8 , further comprising: program instructions to calculate a residual value for each record based on the corresponding predicted target value and an actual target value. 14 . The computer program product of claim 8 , wherein the program instructions to fit the distribution comprise: program instructions to compute a mean and a variance of the predicted target values for each combination value. 15 . A computer system comprising: one or more computer processors; one or more computer readable storage media; program instructions collectively stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the stored program instructions comprising: program instructions to select a top N features for a machine learning (ML) model trained on training data; program instructions to discretize values of each continuous feature of the t

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025053858A1 cover?
In an approach, a processor selects a top N features for a machine learning (ML) model; discretizes values of each continuous feature of the top N features; generates a set of combination values that each represent a unique combination of feature values in for a data record; predicts, using the ML model, a target value for each record generating predicted target values; groups the predicted tar…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).