Automated extraction of structured labels from medical text using deep convolutional networks and use thereof to train a computer vision model
US-2021065859-A1 · Mar 4, 2021 · US
US11322256B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11322256-B2 |
| Application number | US-201816205224-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 30, 2018 |
| Priority date | Nov 30, 2018 |
| Publication date | May 3, 2022 |
| Grant date | May 3, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, computer system, and a computer program product for automatic labeling to train a machine learning algorithm is provided. The present invention may include labeling a medical image with at least one finding from a corresponding medical report. The present invention may include determining a localization information from the labeled medical image. The present invention may include training the machine learning algorithm with the determined localization information. The present invention may include detecting at least one candidate in a test medical image. The present invention may include generating a discrepancy list between the at least one detected candidate in the test medical image and at least one human-reported finding in a corresponding test medical report. The present invention may include, in response to determining that the generated discrepancy list is above a threshold, retraining the trained machine learning algorithm until the generated discrepancy list is below the threshold.
Opening claim text (preview).
What is claimed is: 1. A method for automatic labeling to train a machine learning algorithm, the method comprising: detecting at least one first finding in a medical report and at least one first candidate in a corresponding medical image; interpreting a geometric description of an anatomical location of the detected at least one first finding in the medical report; identifying, using an association algorithm, at least one true finding from the detected at least one first candidate in the corresponding medical image based on the interpreted geometric description of the anatomical location of the detected at least one finding in the medical report; locating, in a sub-region of the corresponding medical image, the detected at least one first candidate based on the interpreted geometric description of the anatomical location of the detected at least one first finding in the medical report; electronically marking the located at least one first candidate in the sub-region of the corresponding medical image; generating a ground truth label by labeling, in a natural language, the electronically marked at least one first candidate in the sub-region of the corresponding medical image with the detected at least one first finding in the medical report; training the machine learning algorithm with the generated ground truth label; detecting, using the trained machine learning algorithm, at least one second candidate in a test medical image, wherein the at least one detected second candidate in the test medical image is associated with predicting at least one second finding from a corresponding test medical report; generating, using the trained machine learning algorithm, a discrepancy list between the at least one detected second candidate in the test medical image and the at least one second finding in the corresponding test medical report; and in response to determining that the generated discrepancy list is above a threshold, retraining the trained machine learning algorithm until the generated discrepancy list is below the threshold. 2. The method of claim 1 , further comprising: determining at least one true finding from the detected at least one first candidate in the corresponding medical image, by association with the detected at least one first finding from the medical report. 3. The method of claim 1 , further comprising: determining a ground truth in the detected at least one first finding from the medical report; and generating the ground truth label for the identified at least one true finding in the corresponding medical image based on the determined ground truth from the medical report. 4. The method of claim 3 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: electronically marking, using a labeling component, at least one pixel indicating a contour of the identified at least one true finding in the corresponding medical image. 5. The method of claim 3 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: providing an electronic label associated with at least one medical report data, wherein the at least one medical report data is selected from the group consisting of: at least one radiology report data and at least one pathology report data. 6. A computer system for automatic labeling to train a machine learning algorithm, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage media, and program instructions stored on at least one of the one or more computer-readable tangible storage media for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: detecting at least one first finding in a medical report and at least one first candidate in a corresponding medical image; interpreting a geometric description of an anatomical location of the detected at least one first finding in the medical report; identifying, using an association algorithm, at least one true finding from the detected at least one first candidate in the corresponding medical image based on the interpreted geometric description of the anatomical location of the detected at least one finding in the medical report; locating, in a sub-region of the corresponding medical image, the detected at least one first candidate based on the interpreted geometric description of the anatomical location of the detected at least one first finding in the medical report; electronically marking the located at least one first candidate in the sub-region of the corresponding medical image; generating a ground truth label by labeling, in a natural language, the electronically marked at least one first candidate in the sub-region of the corresponding medical image with the detected at least one first finding in the medical report; training the machine learning algorithm with the generated ground truth label; detecting, using the trained machine learning algorithm, at least one second candidate in a test medical image, wherein the at least one detected second candidate in the test medical image is associated with predicting at least one second finding from a corresponding test medical report; generating, using the trained machine learning algorithm, a discrepancy list between the at least one detected second candidate in the test medical image and the at least one second finding in the corresponding test medical report; and in response to determining that the generated discrepancy list is above a threshold, retraining the trained machine learning algorithm until the generated discrepancy list is below the threshold. 7. The computer system of claim 6 , further comprising: determining at least one true finding from the detected at least one first candidate in the corresponding medical image, by association with the detected at least one first finding from the medical report. 8. The computer system of claim 6 , further comprising: determining a ground truth in the detected at least one first finding from the medical report; and generating the ground truth label for the identified at least one true finding in the corresponding medical image based on the determined ground truth from the medical report. 9. The computer system of claim 8 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: electronically marking, using a labeling component, at least one pixel indicating a contour of the identified at least one true finding in the corresponding medical image. 10. The computer system of claim 8 , wherein generating the ground truth label for the identified at least one true finding in the corresponding medical image further comprises: providing an electronic label associated with at least one medical report data, wherein the at least one medical report data is selected from the group consisting of: at least one radiology report data and at least one pathology report data. 11. A computer program product for automatic labeling to train a machine learning algorithm, comprising: one or more computer-readable tangible storage media and program instructions stored on at least one of the one or more computer-readable tangible storage media, the program instructions executable by a processor to cause the processor to perform a method comprising: detecting at least one first finding in a medical report and at least one first candidate in a corresponding medical image; interpr
Parsing · CPC title
Semantic analysis · CPC title
ICT specially adapted for medical reports, e.g. generation or transmission thereof · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
for processing medical images, e.g. editing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.