Key category identification and visualization

US12314290B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12314290-B2
Application numberUS-202318333510-A
CountryUS
Kind codeB2
Filing dateJun 12, 2023
Priority dateJun 12, 2023
Publication dateMay 27, 2025
Grant dateMay 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for treating post-modeling data includes computing, sequentially for each category of a feature, a category importance (CI) value. The CI value is based on a model accuracy change when records of a category being examined are reassigned to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include all categories of the feature, except for the category being examined. A post-modeling category is performed to merge of each category having the CI value less than a CI value threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for treating post-modeling data, the method comprising: selecting a category of a feature; computing a first category importance (CI) value for the selected category of the feature, wherein the first CI value is based on a model accuracy change after reassigning records of the selected category to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include each category of the feature, except for the selected category; and computing additional CI values for each category in the remaining set of categories by assigning each category as the selected category and determining the model accuracy change when records of the selected category are reassigned to the remaining set of categories of the feature according to the cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include all categories of the feature, except for the selected category. 2. The computer-implemented method of claim 1 , further comprising providing an illustration of strong CI categories with a CI value above a predetermined CI value threshold, wherein the illustration includes both the CI values of the strong CI categories of the feature and a number of records provided for the strong CI categories of the feature. 3. The computer-implemented method of claim 2 , wherein the illustration further includes weak CI categories with the CI value less than the predetermined CI value threshold. 4. The computer-implemented method of claim 1 , wherein the reassigning of the remaining set of categories of the feature includes obtaining the cumulative distribution of records among the remaining set of categories of the feature by denoting the selected category as an r-th category, assuming a total number of records is N, among which the total number of records of the selected category, r, is n r , and for a k-th category, where the k-th category is selected from the remaining set of categories, the cumulative distribution for the k-th category is a summation of [n 1 /(N−n r )] to [n k /(N−n r )] if k<r or a summation of [n 1 /(N−n r )] to [n k /(N−n r )] minus [n r /(N−n r )] if k>r, wherein n k is a number or records of the k-th category and n 1 is the number of records of a first one of the categories. 5. The computer-implemented method of claim 1 , wherein the reassigning of the remaining set of categories of the feature includes drawing a random number b from [0,1] for each record in the selected category and reassigning each record to one category of the set of remaining categories based on a comparison of the random number to the cumulative distribution for each category. 6. A computer-implemented method for treating post-modeling data, the method comprising: selecting a category of a feature; computing a first category importance (CI) value for the selected category of the feature, wherein the first CI value is based on a model accuracy change after reassigning records of the selected category to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include each category of the feature, except for the selected category; and performing a post-modeling category merge of each category having the CI value less than the CI value threshold. 7. The computer-implemented method of claim 6 , further comprising reducing data collection computational overhead by relying on a reduced number of categories based on the post-modeling category merge for new data. 8. The computer-implemented method of claim 6 , wherein, when a first category and a second category each have respective CI value less than the CI value threshold, the method further comprises: determining a first model accuracy change based on a first merge process of merging the first category into the second category; determining a second model accuracy change based on a second merge process of merging the second category into the first category; and performing the first merge process when the first model accuracy change is less than the second model accuracy change; and performing the second merge process with the second model accuracy change is less than the first model accuracy change. 9. A computer-implemented method for treating post-modeling data, the method comprising: computing, sequentially for each category of a feature, a category importance (CI) value, wherein the CI value is based on a model accuracy change after reassigning records of a category being examined to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature; and the remaining set of categories include all categories of the feature, except for the category being examined; and performing a post-modeling category merge of each category having the CI value less than a CI value threshold. 10. The computer-implemented method of claim 9 , further comprising reducing data collection computational overhead by relying on a reduced number of categories based on the post-modeling category merge for new data. 11. The computer-implemented method of claim 9 , wherein, when a first category and a second category each have respective CI value less than the CI value threshold, the method further comprises: determining a first model accuracy change based on a first merge process of merging the first category into the second category; determining a second model accuracy change based on a second merge process of merging the second category into the first category; performing the first merge process when the first model accuracy change is less than the second model accuracy change; and performing the second merge process with the second model accuracy change is less than the first model accuracy change. 12. The computer-implemented method of claim 9 , further comprising providing an illustration of strong CI categories with a CI value above a predetermined CI value threshold, wherein the illustration includes both the CI values of the strong CI categories of the feature and a number of records provided for the strong CI categories of the feature. 13. The computer-implemented method of claim 12 , wherein the illustration further includes weak CI categories with the CI value being less than the predetermined CI value threshold. 14. The computer-implemented method of claim 9 , wherein the reassigning of the remaining set of categories of the feature includes obtaining the cumulative distribution of records among the remaining set of categories of the feature by denoting a selected category as an r-th category, assuming a total number of records is N, among which the total number of records of the selected category, r, is n r , and for a k-th category, where the k-th category is selected from the remaining set of categories, the cumulative distribution for the k-th category is a summation of [n 1 /(N−n r )] to [n k /(N−n r )] if k<r or a summation of [n 1 /(N−n r )] to [n k /(N−n r )] minus [n r /(N−n r )] if k>r, wherein n k is a number or records of the k-th category and n 1 is the number of records of a first one of the categories. 15. The computer-implemented method of claim 14 , wherein the reassigning of the remaining set of categories of the feature includes drawing a random number b from [0,1] for each record in the selected category and reassigning each reco

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12314290B2 cover?
A computer-implemented method for treating post-modeling data includes computing, sequentially for each category of a feature, a category importance (CI) value. The CI value is based on a model accuracy change when records of a category being examined are reassigned to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categ…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/287. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).