System and method of tuning item classification
US-9436919-B2 · Sep 6, 2016 · US
US2016188711A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016188711-A1 |
| Application number | US-201414582204-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 24, 2014 |
| Priority date | Dec 24, 2014 |
| Publication date | Jun 30, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, computing systems and computer program products implement embodiments of the present invention that include selecting a dataset that includes instances, each of the instances having respective features, and determining an expected distribution of the instances among multiple categories. A first classification is generated that includes, for each of the instances based on their respective features, one or more first categories and a corresponding confidence score for each of the one or more categories. One or more of the instances classified into the given category are allocated to each given category, based on their respective confidence scores, using the allocated one or more instances, a second classification is generated.
Opening claim text (preview).
1 . A method, comprising: selecting a dataset comprising instances, each of the instances having respective features; determining an expected distribution of the instances among multiple categories; generating a first classification comprising, for each of the instances based on their respective features, one or more first categories and a corresponding confidence score for each of the one or more categories; allocating, to each given category, one or more of the instances classified into the given category based on their respective confidence scores; and generating, using the allocated one or more instances, a second classification. 2 . The method according to claim 1 , wherein generating the classification comprises applying a classification application to the dataset. 3 . The method according to claim 1 , wherein generating the expected distribution comprises applying a quantifier to the dataset. 4 . The method according to claim 2 , wherein quantifier is selected from a list consisting of a calibration matrix based quantifier, a scaling based quantifier and a statistical quantifier. 5 . The method according to claim 2 , wherein the expected distribution comprises a respective target number of instances for each of the categories, and wherein selecting the one or more instances comprises assigning, to each given category, the respective target number of instances classified into the given category and whose respective confidence scores are highest among all of the instances assigned to the given category. 6 . The method according to claim 1 , wherein the second distribution has a classification distribution in accordance with the expected distribution. 7 . An apparatus, comprising: a memory configured to store a dataset comprising instances, each of the instances having respective features; and a processor configured: to select the dataset to determine an expected distribution of the instances among multiple categories, to generate a first classification comprising, for each of the instances based on their respective features, that classifies, to each of the instances based on their respective features, one or more first categories and a respective corresponding confidence score for each of the one or more categories, to allocate, to each given category, one or more of the instances classified into the given category based on their respective confidence scores, and to generate, using the allocated one or more instances, a second classification having a classification distribution in accordance with the expected distribution. 8 . The apparatus according to claim 7 , wherein the processor is configured to generate the classification by applying a classification application to the dataset. 9 . The apparatus according to claim 7 , wherein the processor is configured to generate the expected distribution by applying a quantifier to the dataset. 10 . The apparatus according to claim 8 , wherein quantifier is selected from a list consisting of a calibration matrix based quantifier, a scaling based quantifier and a statistical quantifier. 11 . The apparatus according to claim 8 , wherein the expected distribution comprises a respective target number of instances for each of the categories, and wherein the processor is configured to select the one or more instances by assigning, to each given category, the respective target number of instances classified into the given category and whose respective confidence scores are highest among all of the instances assigned to the given category. 12 . The apparatus according to claim 7 , wherein the second distribution has a classification distribution in accordance with the expected distribution. 13 . A computer program product, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to select a dataset comprising instances, each of the instances having respective features; computer readable program code configured to determine an expected distribution of the instances among multiple categories; computer readable program code configured to generate a first classification comprising, for each of the instances based on their respective features, one or more first categories and a corresponding confidence score for each of the one or more categories; computer readable program code configured to allocate, to each given category, one or more of the instances classified into the given category based on their respective confidence scores; and computer readable program code configured to generate, using the one or more allocated instances, a second classification having a classification distribution in accordance with the expected distribution. 14 . The computer program product according to claim 13 , wherein the computer readable program code is configured to generate the classification by applying a classification application to the dataset. 15 . The computer program product according to claim 13 , wherein the computer readable program code is configured to generate the expected distribution by applying a quantifier to the dataset. 16 . The computer program product according to claim 14 , wherein quantifier is selected from a list consisting of a calibration matrix based quantifier, a scaling based quantifier and a statistical quantifier. 17 . The computer program product according to claim 14 , wherein the expected distribution comprises a respective target number of instances for each of the categories, and wherein the computer readable program code is configured to select the one or more instances by assigning, to each given category, the respective target number of instances classified into the given category and whose respective confidence scores are highest among all of the instances assigned to the given category. 18 . The computer program product according to claim 13 , wherein the second distribution has a classification distribution in accordance with the expected distribution.
into predefined classes · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.