Facilitating identification of error image label

US12536779B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12536779-B2
Application numberUS-202318358274-A
CountryUS
Kind codeB2
Filing dateJul 25, 2023
Priority dateJul 25, 2023
Publication dateJan 27, 2026
Grant dateJan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer system, and program product facilitate identification of error image labels in training data. The method comprises: evenly dividing a training dataset into N subsets, where the training dataset includes M data items each comprising a pair of image and its original image label; training a prediction model to label images by respectively using each of the N subsets as training data to generate N respective trained prediction models; respectively using each of the N trained prediction models trained by using one of the N subsets as training data to label the images in other N−1 subsets of the N subsets to generate N−1 prediction labels for each of the M images in the training dataset. For each image in the M data items, whether the original image label of the image is a potential error image label is based on the N−1 prediction labels of the image.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising, using one or more processing units: evenly dividing a training dataset into N subsets, wherein the training dataset includes M data items, each comprising a pair comprising an image and its original image label, wherein N is an integer greater than 1 and M is an integer greater than 0; training a prediction model to label images by respectively using each of the N subsets as training data to generate N trained prediction models; respectively using each of the N trained prediction models trained by using one of the N subsets as training data to label the images in other N−1 subsets of the N subsets to generate N−1 prediction labels for each of the M images in the training dataset; and for each image in the M data items, determining whether the original image label of the image is a potential error image label is based on the N−1 prediction labels of the image. 2 . The computer-implemented method of claim 1 , wherein: the data items in the training dataset are categorized according to the original image labels in the data items; and the category distribution of data items in each of the N subsets follows the category distribution of data items in the training dataset. 3 . The computer-implemented method of claim 1 , further comprising: adjusting the sizes of the subsets to improve prediction accuracy of the trained prediction models. 4 . The computer-implemented method of claim 3 , further comprising: selecting the sizes of the subsets such that the variance of the prediction accuracy of the trained prediction models is minimized. 5 . The computer-implemented method of claim 1 , wherein determining whether the original image label of the image is a potential error image label based on the N−1 prediction labels of the image comprises: in response to the original image label of the image being not consistent with one of the N−1 prediction labels of the image, determining the original image label to be a potential error image label. 6 . The computer-implemented method of claim 1 , wherein determining whether the original image label of the image is a potential error image label based on the N−1 prediction labels of the image comprises: in response to one of the N−1 prediction labels being not consistent with another of the N−1 prediction labels, determining the original image label to be a potential error image label. 7 . A computer system comprising: one or more computer processors; one or more computer readable media; and program instructions, stored on the one or more computer readable media for execution by at least one of the one or more processors, wherein the program instructions are configured to performing the following operations: evenly dividing a training dataset into N subsets, wherein the training dataset includes M data items each comprising a pair comprising an image and its original image label, wherein N is an integer greater than 1 and M is an integer greater than 0; training a prediction model to label images by respectively using each of the N subsets as training data to generate N respective trained prediction models; respectively using each of the N trained prediction models trained by using one of the N subsets as training data to label the images in other N−1 subsets of the N subsets to generate N−1 prediction labels for each of the M images in the training dataset; and for each image in the M data items, whether the original image label of the image is a potential error image label is based on the N−1 prediction labels of the image. 8 . The computer system of the claim 7 , wherein: the data items in the training dataset are categorized according to original image labels in the data items; and the category distribution of data items in each of the N subsets follows the category distribution of data items in the training dataset. 9 . The computer system of the claim 7 , wherein the operations further comprise: adjusting the sizes of the subsets to improve prediction accuracy of the trained prediction models. 10 . The computer system of the claim 7 , wherein the operations further comprise: selecting the sizes of the subsets such that the variance of the prediction accuracy of the trained prediction models is minimized. 11 . The computer system of the claim 7 , wherein determining whether the original image label of the image is a potential error image label based on the N−1 prediction labels of the image comprises: in response to the original image label of the image being not consistent with one of the N−1 prediction labels of the image, determining the original image label to be a potential error image label. 12 . The computer system of the claim 7 , wherein determining whether the original image label of the image is a potential error image label based on the N−1 prediction labels of the image comprises: in response to one of the N−1 prediction labels being not consistent with another of the N−1 prediction labels, determining the original image label to be a potential error image label. 13 . A computer program product comprising: one or more computer readable storage media; and program instructions, stored on the one or more computer readable storage media for execution by at least one of the one or more processors, wherein the program instructions are configured to performing the following operations: evenly dividing a training dataset into N subsets, wherein the training dataset includes M data items each comprising a pair comprising an image and its original image label, wherein N is an integer greater than 1 and M is an integer greater than 0; training a prediction model to label images by respectively using each of the N subsets as training data to generate N respective trained prediction models; respectively using each of the N trained prediction models trained by using one of the N subsets as training data to label the images in other N−1 subsets of the N subsets to generate N−1 prediction labels for each of the M images in the training dataset; and for each image in the M data items, determining whether the original image label of the image is a potential error image label based on the N−1 prediction labels of the image. 14 . The computer program product of the claim 13 , wherein: the data items in the training dataset are categorized according to original image labels in the data items; and the category distribution of data items in each of the N subsets follows the category distribution of data items in the training dataset. 15 . The computer program product of the claim 13 , wherein the operations further comprise: adjusting the sizes of the subsets to improve prediction accuracy of the trained prediction models. 16 . The computer program product of the claim 15 , wherein the operations further comprise: selecting the sizes of the subsets may be chosen such that the variance of the prediction accuracy of the trained prediction models is minimized. 17 . The computer program product of the claim 13 , wherein determining whether the original image label of the image is a potential error image label based on the N−1 prediction labels of the image comprises: in response to the original image label of the image being not consistent with one of the N−1 prediction labels of the image, determining the original image label to be a potential error image label. 18 . The computer program product of the claim 13 , wherein determining whether the original image label of the image is

Assignees

Inventors

Classifications

  • G06V10/98Primary

    Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns · CPC title

  • G06V10/774Primary

    Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536779B2 cover?
A method, computer system, and program product facilitate identification of error image labels in training data. The method comprises: evenly dividing a training dataset into N subsets, where the training dataset includes M data items each comprising a pair of image and its original image label; training a prediction model to label images by respectively using each of the N subsets as training …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06V10/98. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).