Combining User Feedback With an Automated Entity-Resolution Process Executed on a Computer System
US-2024126732-A1 · Apr 18, 2024 · US
US12314290B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12314290-B2 |
| Application number | US-202318333510-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 12, 2023 |
| Priority date | Jun 12, 2023 |
| Publication date | May 27, 2025 |
| Grant date | May 27, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for treating post-modeling data includes computing, sequentially for each category of a feature, a category importance (CI) value. The CI value is based on a model accuracy change when records of a category being examined are reassigned to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include all categories of the feature, except for the category being examined. A post-modeling category is performed to merge of each category having the CI value less than a CI value threshold.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for treating post-modeling data, the method comprising: selecting a category of a feature; computing a first category importance (CI) value for the selected category of the feature, wherein the first CI value is based on a model accuracy change after reassigning records of the selected category to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include each category of the feature, except for the selected category; and computing additional CI values for each category in the remaining set of categories by assigning each category as the selected category and determining the model accuracy change when records of the selected category are reassigned to the remaining set of categories of the feature according to the cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include all categories of the feature, except for the selected category. 2. The computer-implemented method of claim 1 , further comprising providing an illustration of strong CI categories with a CI value above a predetermined CI value threshold, wherein the illustration includes both the CI values of the strong CI categories of the feature and a number of records provided for the strong CI categories of the feature. 3. The computer-implemented method of claim 2 , wherein the illustration further includes weak CI categories with the CI value less than the predetermined CI value threshold. 4. The computer-implemented method of claim 1 , wherein the reassigning of the remaining set of categories of the feature includes obtaining the cumulative distribution of records among the remaining set of categories of the feature by denoting the selected category as an r-th category, assuming a total number of records is N, among which the total number of records of the selected category, r, is n r , and for a k-th category, where the k-th category is selected from the remaining set of categories, the cumulative distribution for the k-th category is a summation of [n 1 /(N−n r )] to [n k /(N−n r )] if k<r or a summation of [n 1 /(N−n r )] to [n k /(N−n r )] minus [n r /(N−n r )] if k>r, wherein n k is a number or records of the k-th category and n 1 is the number of records of a first one of the categories. 5. The computer-implemented method of claim 1 , wherein the reassigning of the remaining set of categories of the feature includes drawing a random number b from [0,1] for each record in the selected category and reassigning each record to one category of the set of remaining categories based on a comparison of the random number to the cumulative distribution for each category. 6. A computer-implemented method for treating post-modeling data, the method comprising: selecting a category of a feature; computing a first category importance (CI) value for the selected category of the feature, wherein the first CI value is based on a model accuracy change after reassigning records of the selected category to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature, wherein the remaining set of categories include each category of the feature, except for the selected category; and performing a post-modeling category merge of each category having the CI value less than the CI value threshold. 7. The computer-implemented method of claim 6 , further comprising reducing data collection computational overhead by relying on a reduced number of categories based on the post-modeling category merge for new data. 8. The computer-implemented method of claim 6 , wherein, when a first category and a second category each have respective CI value less than the CI value threshold, the method further comprises: determining a first model accuracy change based on a first merge process of merging the first category into the second category; determining a second model accuracy change based on a second merge process of merging the second category into the first category; and performing the first merge process when the first model accuracy change is less than the second model accuracy change; and performing the second merge process with the second model accuracy change is less than the first model accuracy change. 9. A computer-implemented method for treating post-modeling data, the method comprising: computing, sequentially for each category of a feature, a category importance (CI) value, wherein the CI value is based on a model accuracy change after reassigning records of a category being examined to a remaining set of categories of the feature according to a cumulative distribution of records among the remaining set of categories of the feature; and the remaining set of categories include all categories of the feature, except for the category being examined; and performing a post-modeling category merge of each category having the CI value less than a CI value threshold. 10. The computer-implemented method of claim 9 , further comprising reducing data collection computational overhead by relying on a reduced number of categories based on the post-modeling category merge for new data. 11. The computer-implemented method of claim 9 , wherein, when a first category and a second category each have respective CI value less than the CI value threshold, the method further comprises: determining a first model accuracy change based on a first merge process of merging the first category into the second category; determining a second model accuracy change based on a second merge process of merging the second category into the first category; performing the first merge process when the first model accuracy change is less than the second model accuracy change; and performing the second merge process with the second model accuracy change is less than the first model accuracy change. 12. The computer-implemented method of claim 9 , further comprising providing an illustration of strong CI categories with a CI value above a predetermined CI value threshold, wherein the illustration includes both the CI values of the strong CI categories of the feature and a number of records provided for the strong CI categories of the feature. 13. The computer-implemented method of claim 12 , wherein the illustration further includes weak CI categories with the CI value being less than the predetermined CI value threshold. 14. The computer-implemented method of claim 9 , wherein the reassigning of the remaining set of categories of the feature includes obtaining the cumulative distribution of records among the remaining set of categories of the feature by denoting a selected category as an r-th category, assuming a total number of records is N, among which the total number of records of the selected category, r, is n r , and for a k-th category, where the k-th category is selected from the remaining set of categories, the cumulative distribution for the k-th category is a summation of [n 1 /(N−n r )] to [n k /(N−n r )] if k<r or a summation of [n 1 /(N−n r )] to [n k /(N−n r )] minus [n r /(N−n r )] if k>r, wherein n k is a number or records of the k-th category and n 1 is the number of records of a first one of the categories. 15. The computer-implemented method of claim 14 , wherein the reassigning of the remaining set of categories of the feature includes drawing a random number b from [0,1] for each record in the selected category and reassigning each reco
for performance assessment · CPC title
Clustering or classification · CPC title
Visualization; Browsing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.