Systems and methods for identifying influential training data points
US-2021103829-A1 · Apr 8, 2021 · US
US11651276B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11651276-B2 |
| Application number | US-201916669685-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 31, 2019 |
| Priority date | Oct 31, 2019 |
| Publication date | May 16, 2023 |
| Grant date | May 16, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for generating a group of representative model cases for a trained machine learning model may be provided. The method comprising determining an input space, determining an initial plurality of model cases, and expanding the initial plurality of model cases by stepwise modifying field values of the records representing the initial plurality of model cases resulting in an exploration set of model cases. Additionally, the method comprises obtaining a model score value for each record of the exploration set of model cases, continuing the expansion of the exploration set of model cases thereby generating a refined model case set, and selecting the records in the refined model case set based on relative record distance values and related model score values between pairs of records, thereby generating the group of representative model cases.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for generating a group of representative model cases for a trained machine learning model, the method comprising: determining an input space for the trained machine learning model; determining, based on the determined input space, an initial plurality of model cases, wherein each model case is represented by a record comprising a plurality of record fields with respective initial input values; expanding the initial plurality of model cases by stepwise modifying field values of records representing the initial plurality of model cases resulting in an exploration set of model cases; obtaining a model score value for each record of the exploration set of model cases; continuing the expansion of the exploration set of model cases in direction of those modified model cases that provide one of a relatively high and a relatively low model score value compared to records of the exploration set, thereby generating a refined model case set; selecting records in the refined model case set based on relative record distance values and related model score values between pairs of records, thereby generating the group of representative model cases; and reducing the number of records in the refined model case set by selecting those records using a distance measure representing the most relevant records for each of a plurality of clusters using a clustering model. 2. The method according to claim 1 , wherein determining the input space comprises: using predefined values for the fields of records, or using a group of training data of the trained machine learning model as seed records, or using outlier of records of the training data of the trained machine learning model as seed records. 3. The method according to claim 1 , wherein the stepwise modification of the field values of records comprises: using as basis for the modification records of a group comprising a confidence level above a predefined high confidence level value, or using as basis for the modification records of a group comprising a confidence level above a predefined low confidence level value, or using as basis for the modification records of a group with a predefined high range of range of confidence level values and/or a group with a predefined low range of range of confidence level values which are at the border of the input space. 4. The method according to claim 1 , wherein the initial plurality of the model cases is generated randomly or using a subset of training or validation data of the trained machine learning model. 5. The method according to claim 1 , wherein the stepwise modification of the field values of records representing the initial plurality of model cases comprises modifying randomly one field value of a record at a time or a plurality of field values of a related plurality of fields at a time. 6. The method according to claim 1 , wherein the stepwise modification of the field values of records representing the initial plurality of model cases comprises modifying one or more field values of a record at a time or a plurality of field values of a related plurality of fields under an influence of a constraint supporting a given target. 7. The method according to claim 6 , further comprising interrupting the continuation of the expansion of the exploration set of model cases if: newly generated records for expanding the exploration set of model cases do not show a better support for the target than already available records in the exploration set of model cases, a predefined number of records is reached within the exploration set, a predefined number of expansion cycles has been performed, or a preset time period ended. 8. The method according to claim 1 , wherein the distance measure is a cosine similarity. 9. A representative case generation system for generating a group of representative model cases for a trained machine learning model, the system comprising at least one processor and a memory storing program instructions thereon, the program instructions executable by the at least processor to cause the system to perform a method comprising: determining an input space for the trained machine learning model; determining, based on the determined input space, an initial plurality of model cases, wherein each model case is represented by a record comprising a plurality of record fields with respective initial input values; expanding the initial plurality of model cases by stepwise modifying field values of records representing the initial plurality of model cases resulting in an exploration set of model cases; obtaining a model score value for each record of the exploration set of model cases; continuing the expansion of the exploration set of model cases in direction of those modified model cases that provide one of a relatively high and a relatively low model score value compared to records of the exploration set, thereby generating a refined model case set; selecting records in the refined model case set based on relative record distance values and related model score values between pairs of records, thereby generating the group of representative model cases; and reducing the number of records in the refined model case set by selecting those records during using a distance measure representing the most relevant records for each of a plurality of clusters using a clustering model. 10. The system according to claim 9 , wherein determining the input space comprises: using predefined values for the fields of records, or using a group of training data of the trained machine learning model as seed records, or using outlier of records of the training data of the trained machine learning model as seed records. 11. The system according to claim 9 , wherein the stepwise modification of the field values of records comprises: using as basis for the modification records of a group comprising a confidence level above a predefined high confidence level value, or using as basis for the modification records of a group comprising a confidence level above a predefined low confidence level value, or using as basis for the modification records of a group with a predefined high range of range of confidence level values and/or a group with a predefined low range of range of confidence level values which are at the border of the input space. 12. The system according to claim 9 , wherein the initial plurality of the model cases is generated randomly or using a subset of training or validation data of the trained machine learning model. 13. The system according to claim 9 , wherein the stepwise modification of the field values of records representing the initial plurality of model cases comprises modifying randomly one field value of a record at a time or a plurality of field values of a related plurality of fields at a time. 14. The system according to claim 9 , wherein the stepwise modification of the field values of records representing the initial plurality of model cases comprises modifying one or more field values of a record at a time or a plurality of field values of a related plurality of fields under an influence of a constraint supporting a given target. 15. The system according to claim 14 , wherein the method further comprises interrupting the continuation of the expansion of the exploration set of model cases if: newly generated records for expanding the exploration set of model cases do not show a better support for the target than already available records in the exploration set of model cases, a predefined number of records is reached within the exploration set, a p
Related publications grouped by family.
Answers are generated from the same data shown on this page.