Generating high quality training data collections for training artificial intelligence models

US12333716B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12333716-B2
Application numberUS-202217660717-A
CountryUS
Kind codeB2
Filing dateApr 26, 2022
Priority dateApr 26, 2022
Publication dateJun 17, 2025
Grant dateJun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described for generating high quality training data collections for training artificial intelligence (AI) models in the medical imaging domain. A method embodiment comprises receiving, by a system comprising processor, input indicating a clinical context associated with usage of a medical image dataset, and selecting, by the system, one or more data scrutiny metrics for filtering the medical image dataset based on the clinical context. The method further comprises applying, by the system, one or more image processing functions to the medical image dataset to generate metric values of the one or more data scrutiny metrics for respective medical images included in the medical image dataset, filtering, by the system, the medical image dataset into one or more subsets based on one or more acceptability criteria for the metric values.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory, wherein the computer executable components comprise: a clinical criteria selection component that receives first input indicating a clinical context associated with usage of a medical image dataset comprising a plurality of medical images; a scrutiny criteria selection component that selects one or more image quality metrics for filtering the medical image dataset based on the clinical context, the one or more images quality metrics respectively related to a measure of medical image quality, wherein the one or more image quality metrics are selected from the group consisting of: signal to noise ratio, peak signal to noise ratio, mean square error, structural similarity index, feature similarity index, variance inflation factor and Laplacian loss; an image processing component that applies one or more image processing functions to the medical image dataset to generate metric values of the one or more image quality metrics for respective medical images included in the medical image dataset; a filtering component that filters the medical image dataset into one or more subsets based on one or more acceptability criteria for the metric values, the one or more subsets respectively comprising a portion of the medical images; a visualization component that generates one or more graphical visualizations representative of the metric values for the respective medical images; and a rendering component that renders the one or more graphical visualizations via an interactive graphical user interface, wherein the one or more acceptability criteria comprises acceptable values for the one or more metric values and wherein the one or more graphical visualizations distinguish the one or more subsets associated with the acceptable values from outlier images of the medical image dataset associated with unacceptable values. 2. The system of claim 1 , wherein the first input indicates one or more clinical inferencing tasks for training one or more machine learning models to perform on the one or more subsets, and wherein the computer executable component further comprise: a training data curation component that stores the one or more subsets in corresponding training data collections for training the one or more machine learning models to perform the one or more clinical inferencing tasks. 3. The system of claim 2 , wherein the computer executable components further comprise: a training component that trains the one or more machine learning models using the one or more subsets. 4. The system of claim 1 , wherein the first input indicates one or more clinical inferencing tasks for training one or more machine learning models to perform on the one or more subsets, wherein the clinical criteria selection component further receives second input identifying one or more anatomical regions of interest relevant to the one or more clinical inferencing tasks, and wherein the filtering component further filters the medical image dataset into the one or more subsets based on whether the respective medical images depict the one or more anatomical regions of interest. 5. The system of claim 1 , wherein the interactive graphical user interface provides for receiving the first input and receiving additional input manually defining the one or more image quality metrics and the one or more acceptability criteria. 6. The system of claim 5 , wherein the one or more image quality-metrics comprise two or more image quality metrics and wherein the interactive graphical user interface further provides for defining the acceptability criteria based on individual image quality metrics of the two or more image quality metrics and combinations of the two or more image quality metrics and generating the one or more subsets based on individual image quality metrics of the two or more image quality metrics and combinations of the two or more image quality metrics. 7. A method comprising: receiving, by a system comprising a processor, first input indicating a clinical context associated with usage of a medical image dataset comprising a plurality of medical images; selecting, by the system, one or more image quality metrics for filtering the medical image dataset based on the clinical context, the one or more images quality metrics respectively related to a measure of medical image quality, wherein the one or more image quality metrics are selected from the group consisting of: signal to noise ratio, peak signal to noise ratio, mean square error, structural similarity index, feature similarity index, variance inflation factor and Laplacian loss; applying, by the system, one or more image processing functions to the medical image dataset to generate metric values of the one or more image quality metrics for respective medical images included in the medical image dataset; filtering, by the system, the medical image dataset into one or more subsets based on one or more acceptability criteria for the metric values; generating, by the system, one or more graphical visualizations representative of the metric values for the respective medical images; and rendering, by the system, the one or more graphical visualizations via an interactive graphical user interface, wherein the one or more acceptability criteria comprises acceptable values for the one or more metric values and wherein the one or more graphical visualizations distinguish the one or more subsets associated with the acceptable values from outlier images of the medical image dataset associated with unacceptable values. 8. The method of claim 7 , wherein the first input indicates one or more clinical inferencing tasks for training one or more machine learning models to perform on the one or more subsets, and wherein the method further comprises: storing, by the system, the one or more subsets in corresponding training data collections for training the one or more machine learning models to perform the one or more clinical inferencing tasks. 9. The method of claim 8 , wherein the computer executable components further comprise: training, by the system, the one or more machine learning models using the one or more subsets. 10. The method of claim 7 , wherein the first input indicates one or more clinical inferencing tasks for training one or more machine learning models to perform on the one or more subsets, and wherein the method further comprises: receiving, by the system, second input identifying one or more anatomical regions of interest relevant to the one or more clinical inferencing tasks, and wherein the filtering comprises filtering the medical image dataset into the one or more subsets based on whether the respective medical images depict the one or more anatomical regions of interest. 11. The method of claim 7 , wherein the interactive graphical user interface provides for receiving the first input and receiving additional input manually defining the one or more image quality metrics and the one or more acceptability criteria. 12. The system of claim 11 , wherein the one or more image quality metrics comprise two or more image quality metrics and wherein the interactive graphical user interface further provides for defining the acceptability criteria based on individual image quality metrics of the two or more image quality metrics and combinations of the two or more image quality metrics and generating the one or more subsets based on individual image quality metrics of the two or more data image quality and combinations of the two or more image quality metrics.

Assignees

Inventors

Classifications

  • Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title

  • Machine learning · CPC title

  • Recognition of patterns in medical or anatomical images · CPC title

  • Training; Learning · CPC title

  • Brain · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333716B2 cover?
Techniques are described for generating high quality training data collections for training artificial intelligence (AI) models in the medical imaging domain. A method embodiment comprises receiving, by a system comprising processor, input indicating a clinical context associated with usage of a medical image dataset, and selecting, by the system, one or more data scrutiny metrics for filtering…
Who is the assignee on this patent?
Ge Prec Healthcare Llc
What technology area does this patent fall under?
Primary CPC classification G06T7/0012. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).