Dataset classification quantification

US9424530B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9424530-B2
Application numberUS-201514604765-A
CountryUS
Kind codeB2
Filing dateJan 26, 2015
Priority dateJan 26, 2015
Publication dateAug 23, 2016
Grant dateAug 23, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, computing systems and computer program products implement embodiments of the present invention that include selecting a training dataset including training instances having respective training features, and applying a classifier to the training dataset, thereby generating a training classification that assigns, to each of the training instances, one of a plurality of categories, the classifier having an expected classification. A classification bias is detected in the training classification relative to the expected classification, and in response to the classification bias, a calibration matrix is defined based on the training features, and the classification bias. A production dataset including production instances is selected, and the classifier and the calibration matrix are applied to the production dataset, thereby generating a production classification quantification that assigns, to each of the production instances, one of the plurality of categories.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, comprising: selecting a training dataset comprising training instances having respective training features; applying a classifier to the training dataset, thereby generating a training classification that assigns, to each of the training instances, one of a plurality of categories, the classifier having an expected classification; detecting a classification bias in the training classification relative to the expected classification; defining, in response to the classification bias, a calibration matrix based on conditional probabilities of features of a production dataset given occurrences of corresponding training features; selecting a production dataset comprising production instances; and applying the classifier and the calibration matrix to the production dataset, thereby generating a production classification quantification that assigns, to each of the production instances, one of the plurality of categories. 2. The method according to claim 1 , wherein the training set comprises an annotated training set and the production dataset comprises an unseen dataset. 3. The method according to claim 1 , wherein the calibration matrix comprises multiple matrix entries, each of the matrix entries comprising a given feature and an adjustment factor. 4. The method according to claim 3 , wherein the classification bias comprises a respective individual bias for each of the features. 5. The method according to claim 4 , wherein defining the calibration matrix comprises calculating the respective adjustment factor for each of the matrix entries based on the respective individual bias. 6. The method according to claim 1 , wherein applying the classifier and the calibration matrix comprises applying the classifier to the production dataset, thereby generating an intermediate classification, and applying the calibration matrix to the intermediate classification, thereby generating the production classification quantification. 7. The method according to claim 6 , wherein the classification bias comprises a training classification bias, and wherein the intermediate classification has an intermediate classification bias relative to the expected classification, and wherein the production classification quantification has a production classification bias that is less than the intermediate classification bias. 8. An apparatus, comprising: a memory configured to store a classifier, a training dataset comprising training instances having respective training features, and a production dataset comprising production instances; and a processor configured: to apply a classifier to the training dataset, thereby generating a training classification that assigns, to each of the training instances, one of a plurality of categories, the classifier having an expected classification, to detect a classification bias in the training classification relative to the expected classification, to define, in response to the classification bias, a calibration matrix based on conditional probabilities of features of the production dataset given occurrences of corresponding training features; to apply the classifier and the calibration matrix to the production dataset, thereby generating a production classification quantification that assigns, to each of the production instances, one of the plurality of categories. 9. The apparatus according to claim 8 , wherein the training set comprises an annotated training set and the production dataset comprises an unseen dataset. 10. The apparatus according to claim 8 , wherein the calibration matrix comprises multiple matrix entries, each of the matrix entries comprising a given feature and an adjustment factor. 11. The apparatus according to claim 10 , wherein the classification bias comprises a respective individual bias for each of the features. 12. The apparatus according to claim 11 , wherein the processor is configured to define the calibration matrix by calculating the respective adjustment factor for each of the matrix entries based on the respective individual bias. 13. The apparatus according to claim 8 , wherein the processor is configured to apply the classifier and the calibration matrix by applying the classifier to the production dataset, thereby generating an intermediate classification, and applying the calibration matrix to the intermediate classification, thereby generating the production classification quantification. 14. The apparatus according to claim 13 , wherein the classification bias comprises a training classification bias, and wherein the intermediate classification has an intermediate classification bias relative to the expected classification, and wherein the production classification quantification has a production classification bias that is less than the intermediate classification bias. 15. A computer program product, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to select a training dataset comprising training instances having respective training features; computer readable program code configured to apply a classifier to the training dataset, thereby generating a training classification that assigns, to each of the training instances, one of a plurality of categories, the classifier having an expected classification; computer readable program code configured to detect a classification bias in the training classification relative to the expected classification; computer readable program code configured to define, in response to the classification bias, a calibration matrix based on conditional probabilities of features of a production dataset given occurrences of corresponding the training features computer readable program code configured to select a production dataset comprising production instances; and computer readable program code configured to apply the classifier and the calibration matrix to the production dataset, thereby generating a production classification quantification that assigns, to each of the production instances, one of the plurality of categories. 16. The computer program product according to claim 15 , wherein the training set comprises an annotated training set and the production dataset comprises an unseen dataset. 17. The computer program product according to claim 15 , wherein the calibration matrix comprises multiple matrix entries, each of the matrix entries comprising a given feature and an adjustment factor. 18. The computer program product according to claim 17 , wherein the classification bias comprises a respective individual bias for each of the features, and wherein the computer readable program code configured to define the calibration matrix by calculating the respective adjustment factor for each of the matrix entries based on the respective individual bias. 19. The computer program product according to claim 15 , wherein the computer readable program code is configured to apply the classifier and the calibration matrix by applying the classifier to the production dataset, thereby generating an intermediate classification, and applying the calibration matrix to the intermediate classification, thereby generating the production classification quantification. 20. The computer program product according to claim 19 , wherein the classification bias comprises a training classification bias, and wherein the inter

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination · CPC title

  • G06N99/005Primary

    Physics · mapped topic

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9424530B2 cover?
Methods, computing systems and computer program products implement embodiments of the present invention that include selecting a training dataset including training instances having respective training features, and applying a classifier to the training dataset, thereby generating a training classification that assigns, to each of the training instances, one of a plurality of categories, the cl…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).