Automated labeling of images to train machine learning

US11322256B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11322256-B2
Application numberUS-201816205224-A
CountryUS
Kind codeB2
Filing dateNov 30, 2018
Priority dateNov 30, 2018
Publication dateMay 3, 2022
Grant dateMay 3, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer system, and a computer program product for automatic labeling to train a machine learning algorithm is provided. The present invention may include labeling a medical image with at least one finding from a corresponding medical report. The present invention may include determining a localization information from the labeled medical image. The present invention may include training the machine learning algorithm with the determined localization information. The present invention may include detecting at least one candidate in a test medical image. The present invention may include generating a discrepancy list between the at least one detected candidate in the test medical image and at least one human-reported finding in a corresponding test medical report. The present invention may include, in response to determining that the generated discrepancy list is above a threshold, retraining the trained machine learning algorithm until the generated discrepancy list is below the threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for automatic labeling to train a machine learning algorithm, the method comprising: detecting at least one first finding in a medical report and at least one first candidate in a corresponding medical image; interpreting a geometric description of an anatomical location of the detected at least one first finding in the medical report; identifying, using an association algorithm, at least one true finding from the detected at least one first candidate in the corresponding medical image based on the interpreted geometric description of the anatomical location of the detected at least one finding in the medical report; locating, in a sub-region of the corresponding medical image, the detected at least one first candidate based on the interpreted geometric description of the anatomical location of the detected at least one first finding in the medical report; electronically marking the located at least one first candidate in the sub-region of the corresponding medical image; generating a ground truth label by labeling, in a natural language, the electronically marked at least one first candidate in the sub-region of the corresponding medical image with the detected at least one first finding in the medical report; training the machine learning algorithm with the generated ground truth label; detecting, using the trained machine learning algorithm, at least one second candidate in a test medical image, wherein the at least one detected second candidate in the test medical image is associated with predicting at least one second finding from a corresponding test medical report; generating, using the trained machine learning algorithm, a discrepancy list between the at least one detected second candidate in the test medical image and the at least one second finding in the corresponding test medical report; and in response to determining that the generated discrepancy list is above a threshold, retraining the trained machine learning algorithm until the generated discrepancy list is below the threshold. 2. The method of claim 1 , further comprising: determining at least one true finding from the detected at least one first candidate in the corresponding medical image, by association with the detected at least one first finding from the medical report. 3. The method of claim 1 , further comprising: determining a ground truth in the detected at least one first finding from the medical report; and generating the ground truth label for the identified at least one true finding in the corresponding medical image based on the determined ground truth from the medical report. 4. The method of claim 3 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: electronically marking, using a labeling component, at least one pixel indicating a contour of the identified at least one true finding in the corresponding medical image. 5. The method of claim 3 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: providing an electronic label associated with at least one medical report data, wherein the at least one medical report data is selected from the group consisting of: at least one radiology report data and at least one pathology report data. 6. A computer system for automatic labeling to train a machine learning algorithm, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage media, and program instructions stored on at least one of the one or more computer-readable tangible storage media for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: detecting at least one first finding in a medical report and at least one first candidate in a corresponding medical image; interpreting a geometric description of an anatomical location of the detected at least one first finding in the medical report; identifying, using an association algorithm, at least one true finding from the detected at least one first candidate in the corresponding medical image based on the interpreted geometric description of the anatomical location of the detected at least one finding in the medical report; locating, in a sub-region of the corresponding medical image, the detected at least one first candidate based on the interpreted geometric description of the anatomical location of the detected at least one first finding in the medical report; electronically marking the located at least one first candidate in the sub-region of the corresponding medical image; generating a ground truth label by labeling, in a natural language, the electronically marked at least one first candidate in the sub-region of the corresponding medical image with the detected at least one first finding in the medical report; training the machine learning algorithm with the generated ground truth label; detecting, using the trained machine learning algorithm, at least one second candidate in a test medical image, wherein the at least one detected second candidate in the test medical image is associated with predicting at least one second finding from a corresponding test medical report; generating, using the trained machine learning algorithm, a discrepancy list between the at least one detected second candidate in the test medical image and the at least one second finding in the corresponding test medical report; and in response to determining that the generated discrepancy list is above a threshold, retraining the trained machine learning algorithm until the generated discrepancy list is below the threshold. 7. The computer system of claim 6 , further comprising: determining at least one true finding from the detected at least one first candidate in the corresponding medical image, by association with the detected at least one first finding from the medical report. 8. The computer system of claim 6 , further comprising: determining a ground truth in the detected at least one first finding from the medical report; and generating the ground truth label for the identified at least one true finding in the corresponding medical image based on the determined ground truth from the medical report. 9. The computer system of claim 8 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: electronically marking, using a labeling component, at least one pixel indicating a contour of the identified at least one true finding in the corresponding medical image. 10. The computer system of claim 8 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: providing an electronic label associated with at least one medical report data, wherein the at least one medical report data is selected from the group consisting of: at least one radiology report data and at least one pathology report data. 11. A computer program product for automatic labeling to train a machine learning algorithm, comprising: one or more computer-readable tangible storage media and program instructions stored on at least one of the one or more computer-readable tangible storage media, the program instructions executable by a processor to cause the processor to perform a method comprising: detecting at least one first finding in a medical report and at least one first candidate in a corresponding medical image; interpr

Assignees

Inventors

Classifications

  • Parsing · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • ICT specially adapted for medical reports, e.g. generation or transmission thereof · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • for processing medical images, e.g. editing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11322256B2 cover?
A method, computer system, and a computer program product for automatic labeling to train a machine learning algorithm is provided. The present invention may include labeling a medical image with at least one finding from a corresponding medical report. The present invention may include determining a localization information from the labeled medical image. The present invention may include trai…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 03 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).