Synonym discovery

US10198499B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10198499-B1
Application numberUS-201213569781-A
CountryUS
Kind codeB1
Filing dateAug 8, 2012
Priority dateAug 8, 2011
Publication dateFeb 5, 2019
Grant dateFeb 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer-readable media are provided for facilitating mapping of semantically similar terms between and among two or more information systems. In particular, to facilitate automatic discovery, establishment, and/or statistical validation of linkages between a plurality of different nomenclatures employed by a plurality of information systems, such as multiple electronic health record systems. In embodiments, the imputation of latent synonymy in corpora comprised of samples of historical records from each system enables automated terminology mapping between disparate systems' records, thereby establishing reliable linkages that may subsequently be utilized for realtime decision support, data mining-based research, or other valuable purposes.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer-readable media having computer-usable instructions embodied thereon that, when executed, enable a processor to perform a method of discovering latent relationships in data, said method comprising: obtaining a first set of records from a first health-records system, the first health-records system having a first structure corresponding to the organization of the first set of records and comprising a set of unified codesets, nomenclature rubric, or unified ontology; obtaining a second set of records from a second health-records system, the second health-records system having a second structure corresponding to the organization of the second set of records and comprising a set of unified codesets, nomenclature rubric, or unified ontology, based on one or more terms, the first structure being incompatible with the second structure; identifying at least one data item associated with episodes of care within said first set of records; selecting first raw data from the first set of records comprising a plurality of instances associated with the at least one data item, the first raw data comprising a set of values associated with the at least one data item; discarding extreme values of said first raw data; selecting second raw data from the second set of records, the second raw data comprising a set of values; determining a subset of the second raw data records as closely matching the at least one data item, the identifying for each of the at least one data item comprising: generating one or more clusters by applying a clustering method to the subset of the second raw data; calculating at least one measure quantifying similarity between the at least one cluster and the first raw data; and determining the subset of the second raw data as closely matching the data item in response to the at least one measure quantifying similarity being less than a predetermined threshold; and in response to determining that the subset of the second raw data records closely match the at least one data item, creating a provisional binding of at least one of the one or more terms associated with the at least one cluster in the second health record system to the data item in the first health-records system, thereby generating, at least partially, a mapping of the first structure to the second structure. 2. The non-transitory computer-readable media of claim 1 , wherein the selecting first raw data from the first set of records comprises selecting records containing demographic attributes associated with episodes of care that are associated with the data item. 3. The non-transitory computer-readable media of claim 2 , wherein the selecting second raw data from the second set of records comprises matching one or more demographic attributes associated with the first set of records to demographic attributes from the second set of records. 4. The non-transitory computer-readable media of claim 1 , further comprising: using a term identified by mapped clusters as a basis for cross-mapping the term to a third health-records system. 5. The non-transitory computer-readable media of claim 1 , wherein the calculating at least one measure quantifying similarity comprises using a two-sample Kolmogorov-Smirnov D test. 6. The non-transitory computer-readable media of claim 1 , wherein the calculating at least one measure quantifying similarity comprises using a non-parametric metric. 7. The non-transitory computer-readable media of claim 6 , wherein the calculating at least one measure quantifying similarity comprises using a Cramer V test. 8. The non-transitory computer-readable media of claim 1 , wherein the applying a cluster method involves reducing the dimensionality. 9. The non-transitory computer-readable media of claim 1 , wherein the applying a cluster method comprises generating a decision-tree classifier. 10. The non-transitory computer-readable media of claim 1 , further comprising displaying a supervisory screen presenting the provisional binding, thereby permitting a user to modify the mapping by including or excluding terms from the provisional mapping. 11. The non-transitory computer-readable media of claim 1 , wherein the identifying a subset of said second raw data records as closely matching the data item comprises matching a plurality of data values within the first raw data and the second raw data. 12. The non-transitory computer-readable media of claim 1 , further comprising reducing said subset of said second raw data records by cleaning the subset of extreme values. 13. The non-transitory computer-readable media of claim 12 , further comprising transforming some values of the subset of the second raw data. 14. A method for discovering latent relationships in data, the method comprising: obtaining a first set of records from a first health-records system, the first health-records system having a first structure corresponding to the organization of the first set of records and comprising a set of unified codesets, nomenclature rubric, or unified ontology; obtaining a second set of records from a second health-records system, the second health-records system having a second structure corresponding to the organization of the second set of records and comprising a set of unified codesets, nomenclature rubric, or unified ontology, based on one or more terms, the first structure being incompatible with the second structure; identifying at least one data item associated with episodes of care within said first set of records; selecting first raw data from the first set of records comprising a plurality of instances associated with the at least one data item, the first raw data comprising a set of values associated with the at least one data item; discarding extreme values of said first raw data; selecting second raw data from the second set of records, the second raw data comprising a set of values; determining a subset of the second raw data records as closely matching the at least one data item, the identifying for each of the at least one data item comprising: generating one or more clusters by applying a clustering method to the subset of the second raw data; calculating at least one measure quantifying similarity between the at least one cluster and the first raw data; and determining the subset of the second raw data as closely matching the data item in response to comparing the measure quantifying similarity being less than a predetermined threshold; and in response to determining that the subset of the second raw data records closely match the at least one data item, creating a provisional binding of at least one of the one or more terms associated with the at least one cluster in the second health record system to the data item in the first health-records system thereby generating, at least partially, a mapping of the first structure to the second structure. 15. The method of claim 14 , further comprising displaying a supervisory screen presenting the provisional binding, thereby permitting a user to modify the mapping by including or excluding clusters from the provisional mapping. 16. The method of claim 15 , wherein said measure is a non-parametric measure. 17. The method of claim 16 , wherein said non-parametric measure uses a two-sample Kolmogorov-Smirnov D test. 18. The method of claim 16 , wherein said non-parametric measure uses a Cramer V test. 19. The method of claim 16 , wherein the selecting first raw data from the first set of records comprises selecting records containi

Assignees

Inventors

Classifications

  • G06F16/288Primary

    Entity relationship models · CPC title

  • ICT specially adapted for medical reports, e.g. generation or transmission thereof · CPC title

  • for mining of medical data, e.g. analysing previous cases of other patients · CPC title

  • Physics · mapped topic

  • G16H10/60Primary

    for patient-specific data, e.g. for electronic patient records · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10198499B1 cover?
Methods, systems, and computer-readable media are provided for facilitating mapping of semantically similar terms between and among two or more information systems. In particular, to facilitate automatic discovery, establishment, and/or statistical validation of linkages between a plurality of different nomenclatures employed by a plurality of information systems, such as multiple electronic he…
Who is the assignee on this patent?
Mcnair Douglas S, Kailasam Kanakasabha K, Murrish John Christopher, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/288. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).