Clinically relevant medical concept clustering

US10839947B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10839947-B2
Application numberUS-201614988787-A
CountryUS
Kind codeB2
Filing dateJan 6, 2016
Priority dateJan 6, 2016
Publication dateNov 17, 2020
Grant dateNov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention embodiments are directed to methods, systems, and computer programs for identifying relations, within at least one taxonomy, between taxonomy categories and concepts extracted from electronic content. The relations represent semantic similarities for the concepts. The concepts are clustered based on the identified relations within the at least one taxonomy.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for clustering concepts extracted from electronic content, the computer program product comprising one or more non-transitory computer readable storage media collectively having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: identify, within a plurality of different taxonomies, relations between taxonomy categories of the different taxonomies and the concepts extracted from the electronic content, wherein the electronic content is from a medical record, the concepts include medical concepts extracted from the medical record, and the plurality of different taxonomies includes medical taxonomies, and wherein the relations represent semantic similarities for the concepts and the identifying relations further includes: mapping the concepts to each of the plurality of different taxonomies, wherein mapping the concepts includes: determining a first concept extracted from the electronic content not found in a selected taxonomy of the plurality of different taxonomies; identifying one or more other taxonomies of the plurality of different taxonomies containing the first concept and determining a second concept that resides in the selected taxonomy and the identified one or more other taxonomies; and mapping the first concept to the second concept within the selected taxonomy when the second concept is closest to the first concept in the identified one or more other taxonomies and within a distance limit of the first concept, wherein the first concept remains unmapped to the selected taxonomy in response to the second concept not satisfying the distance limit; generating concept vectors relating each of the concepts to one or more corresponding taxonomy categories of the different taxonomies, wherein each concept vector is associated with a concept and includes a plurality of values with each value indicating a relationship between that associated concept and a corresponding taxonomy category, and wherein at least one concept has relations to taxonomy categories in two or more taxonomies; and determining a similarity measure between each of the concept vectors of the concepts based on distances between the concept vectors; cluster the concepts based on the determined similarity measure between the concept vectors; and generate a visualization of the electronic content with information arranged according to the clustered concepts to identify information within the electronic content relevant to a situation. 2. The computer program product of claim 1 , wherein the program instructions further cause the processor to: perform named-entity recognition and disambiguation on the concepts, which includes concept identification, named-entity detection, and linking of each identified concept to a meaning. 3. The computer program product of claim 1 , wherein the program instructions further cause the processor to: identify the concepts from both structured information and unstructured information within the electronic content. 4. The computer program product of claim 1 , wherein the taxonomy categories represent a feature space for clustering of the concepts, and wherein the program instructions further cause the processor to: perform dimensionality reduction to remove features from the feature space to reduce processing time. 5. A system comprising: at least one processor configured to: identify, within a plurality of different taxonomies, relations between taxonomy categories of the different taxonomies and concepts extracted from electronic content, wherein the electronic content is from a medical record, the concepts include medical concepts extracted from the medical record, and the plurality of different taxonomies includes medical taxonomies, and wherein the relations represent semantic similarities for the concepts and the identifying relations further includes: mapping the concepts to each of the plurality of different taxonomies, wherein mapping the concepts includes: determining a first concept extracted from the electronic content not found in a selected taxonomy of the plurality of different taxonomies; identifying one or more other taxonomies of the plurality of different taxonomies containing the first concept and determining a second concept that resides in the selected taxonomy and the identified one or more other taxonomies; and mapping the first concept to the second concept within the selected taxonomy when the second concept is closest to the first concept in the identified one or more other taxonomies and within a distance limit of the first concept, wherein the first concept remains unmapped to the selected taxonomy in response to the second concept not satisfying the distance limit; generating concept vectors relating each of the concepts to one or more corresponding taxonomy categories of the different taxonomies, wherein each concept vector is associated with a concept and includes a plurality of values with each value indicating a relationship between that associated concept and a corresponding taxonomy category, and wherein at least one concept has relations to taxonomy categories in two or more taxonomies; and determining a similarity measure between each of the concept vectors of the concepts based on distances between the concept vectors; cluster the concepts based on the determined similarity measure between the concept vectors; and generate a visualization of the electronic content with information arranged according to the clustered concepts to identify information within the electronic content relevant to a situation. 6. The system of claim 5 , wherein the at least one processor is further configured to: identify the concepts from both structured information and unstructured information within the electronic content. 7. The system of claim 5 , wherein the at least one processor is further configured to: perform named-entity recognition and disambiguation on the concepts, which includes concept identification, named-entity detection, and linking of each identified concept to a meaning. 8. The system of claim 5 , wherein the taxonomy categories represent a feature space for clustering of the concepts, and wherein the at least one processor is further configured to: perform dimensionality reduction to remove features from the feature space to reduce processing time. 9. The system of claim 5 , wherein the at least one processor is further configured to: generate a similarity matrix relating the concepts based on the similarity measures; and cluster the concepts based on the similarity measures. 10. The system of claim 5 , wherein the semantic similarities for the concepts represent relative relationships of the concepts to the taxonomy categories such that the concepts are clustered based on identified relevance. 11. The computer program product of claim 1 , wherein the program instructions further cause the processor to: generate a similarity matrix relating the concepts based on the similarity measures; and cluster the concepts based on the similarity measures. 12. The computer program product of claim 1 , wherein the semantic similarities for the concepts represent relative relationships of the concepts to the taxonomy categories such that the concepts are clustered based on identified relevance.

Assignees

Inventors

Classifications

  • G16H10/60Primary

    for patient-specific data, e.g. for electronic patient records · CPC title

  • Clustering or classification · CPC title

  • for mining of medical data, e.g. analysing previous cases of other patients · CPC title

  • Query execution (filtering based on additional data G06F16/335) · CPC title

  • into predefined classes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10839947B2 cover?
The present invention embodiments are directed to methods, systems, and computer programs for identifying relations, within at least one taxonomy, between taxonomy categories and concepts extracted from electronic content. The relations represent semantic similarities for the concepts. The concepts are clustered based on the identified relations within the at least one taxonomy.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G16H10/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).