Data Classification Using Ensemble Models

US2024256637A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024256637-A1
Application numberUS-202318160332-A
CountryUS
Kind codeA1
Filing dateJan 27, 2023
Priority dateJan 27, 2023
Publication dateAug 1, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer implemented method manages an ensemble model system to classify records. A number of processor units cluster records into groups of records based on classification predictions generated by base models in the ensemble model system for the records. The number of processor units determines sets of weights for the base models that increase a probability that the base models in the ensemble model system correctly predict the groups of records. Each set of weights in the sets of weights is associated with a group of records in the groups of records.

First claim

Opening claim text (preview).

1 . A computer implemented method, the computer implemented method comprising: clustering, by a number of processor units, records into groups of records based on classification predictions generated by base models in an ensemble model system for the records; and determining, by the number of processor units, sets of weights for the base models that increase a probability that the base models in the ensemble model system correctly predict the groups of records, wherein each set of weights in the sets of weights is associated with a group of records in the groups of records. 2 . The computer implemented method of claim 1 further comprising: determining, by the number of processor units, the classification predictions for the records using the base models in the ensemble model system; and determining, by the number of processor units, thresholds for the base models that meets a set of criteria for the base models in the ensemble model system, wherein each base model in the base models in the ensemble model system has a threshold in the thresholds that meets a set of criteria. 3 . The computer implemented method of claim 1 further comprising: determining, by the number of processor units, the classification predictions for the records using the base models in the ensemble model system; determining, by the number of processor units, whether a set of redundant base models is present in the base models in the ensemble model system, wherein a given redundant model in the set of redundant base models has a prediction similarity and model type similarity to another base model of the base models; and removing, by the number of processor units, the set of redundant base models from the base models in the ensemble model system in response to the set of redundant base models being present. 4 . The computer implemented method of claim 1 , further comprising: determining, by the number of processor units, the classification predictions for the records using the base models in the ensemble model system; wherein clustering, by the number of processor units, records into groups of records based on the classification predictions generated by base models in the ensemble model system for the records comprises: determining, by the number of processor units, the classification predictions for the records using the base models in the ensemble model system; and placing, by the number of processor units, the records into the groups of records based on similarities between the classification predictions. 5 . The computer implemented method of claim 1 further comprising: determining, by the number of processor units, the classification predictions for the records using the base models in the ensemble model system; and selecting, by the number of processor units, a selection policy that uses the classification predictions to classify the records. 6 . The computer implemented method of claim 1 further comprising: determining, by the number of processor units, the classification predictions for the records using the base models in the ensemble model system; using, by the number processor units, the base models to determine classification predictions for a new record using the base models in the ensemble model system; identifying, by the number processor units, a particular group of records in the groups of records most like the new record using based on the classification predictions made by the base models in the ensemble model system; selecting, by the number processor units, a set of weights in the sets of weights corresponding to the particular group of records; and classifying the new record using the set of weights in the sets of weights and the classification predictions. 7 . The computer implemented method of claim 6 , wherein classifying the new record using the base models in the ensemble model system using the set of weights in the sets of weights comprises: applying, by the number processor units, the set of weights to the probabilities for the prediction results to form modified probabilities for the prediction results; and classifying, by the number processor units, the new record using the classification predictions with the modified probabilities for the prediction results. 8 . A computer system comprising: a number of processor units, wherein the number of processor units executes program instructions to: cluster records into groups of records based on classification predictions generated by base models in the ensemble model system for the records; and determine sets of weights for the base models that increase a probability that the base models in the ensemble model system correctly predict the groups of records, wherein each set of weights in the sets of weights is associated with a group of records in the groups of records. 9 . The computer system of claim 8 , wherein the number of processor units executes the program instructions to: determine the classification predictions for the records using the base models in the ensemble model system; and determine thresholds for the base models that meets a set of criteria for the base models in the ensemble model system, wherein each base model in the base models in the ensemble model system has a threshold in the thresholds that meets a set of criteria. 10 . The computer system of claim 8 , wherein the number of processor units executes the program instructions to: determine the classification predictions for the records using the base models in the ensemble model system; determine whether a set of redundant base models is present in the base models in the ensemble model system, wherein a given redundant model in the set of redundant base models has a prediction similarity and model type similarity to another base model of the base models; and remove the set of redundant base models from the base models in the ensemble model system in response to the set of redundant base models being present. 11 . The computer system of claim 8 , further comprising: determine the classification predictions for the records using the base models in the ensemble model system; wherein in clustering records into groups of records based on the classification predictions generated by base models in the ensemble model system for the records, the number of processor units executes the program instructions to: determine the classification predictions for the records using the base models in the ensemble model system; and place the records into the groups of records based on similarities between the classification predictions. 12 . The computer system of claim 8 , wherein the number of processor units executes the program instructions to: determine the classification predictions for the records using the base models in the ensemble model system; and select a selection policy that uses the classification predictions to classify the records. 13 . The computer system of claim 8 , wherein the number of processor units executes the program instructions to: determine the classification predictions for the records using the base models in the ensemble model system; use the base models to determine a classification prediction for a new record using the base models in the ensemble model system; identify a particular group of records in the groups of records most like the new record based on the classification predictions made by the base models in the ensemble model system; select a set of weights in the sets of weights corresponding to the particular group of records; and classify the new record using the set of weights in the sets of weights and the classification predictions.

Assignees

Inventors

Classifications

  • Clustering or classification · CPC title

  • using statistics or function optimisation, e.g. modelling of probability density functions · CPC title

  • G06F18/241Primary

    relating to the classification model, e.g. parametric or non-parametric approaches · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024256637A1 cover?
A computer implemented method manages an ensemble model system to classify records. A number of processor units cluster records into groups of records based on classification predictions generated by base models in the ensemble model system for the records. The number of processor units determines sets of weights for the base models that increase a probability that the base models in the ensemb…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F18/2321. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 01 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).