Clustering and analysis of electronic medical records
US-2015025908-A1 · Jan 22, 2015 · US
US2020098453A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020098453-A1 |
| Application number | US-201816139678-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 24, 2018 |
| Priority date | Sep 24, 2018 |
| Publication date | Mar 26, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosure provides a method for data instance processing. The method includes obtaining a set of data instances collected from a plurality of organizations. Each of the data instances includes at least one record formed in an organization that stores values of a plurality of attributes of the data instance. The method also includes dividing the set of data instances into groups, wherein data instances with conflicting values for the same attribute are divided into different groups. The method further includes subdividing data instances in each of the groups into clusters.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for data instance processing, the method comprising: obtaining, by one or more processors, a set of data instances collected from a plurality of organizations over a network, wherein each of the data instances includes at least one record formed in an organization that stores values of a plurality of attributes of the data instance; dividing, by one or more processors, the set of data instances into groups, wherein data instances with conflicting values for the same attribute are divided into different groups; and subdividing, by one or more processors, data instances in each of the groups into clusters combining the data instances in each cluster into a new record having values of the plurality of attributes of the data instances in the cluster. 2 . The method according to claim 1 , wherein dividing the set of data instances into groups further includes: for each data instance in the set of data instances, forming, by one or more processors, a value sequence based on the at least one record of the data instance, wherein the value sequence includes selected attributes as its elements; constructing, by one or more processors, a conflict network in which a data instance is represented by a node and there is an edge between two nodes if two data instances represented by the two nodes have at least one conflicting element in their value sequences; and assigning, by one or more processors, labels to nodes of the conflict network with minimum labels, wherein nodes directly connected by an edge in the conflict network have different labels, and data instances represented by nodes with a same label are divided into a same group. 3 . The method according to claim 1 , wherein subdividing data instances in each of the groups into clusters further includes: constructing, by one or more processors, for each of the data instances in each of the groups, a feature vector based on the at least one record of the data instance; calculating, by one or more processors, distances between every two feature vectors; and clustering, by one or more processors, the data instances in each of the groups based on the calculated distances. 4 . The method according to claim 3 , wherein constructing the feature vector further includes at least one of: transforming, by one or more processors, values of attributes stored in the at least one record of the data instance into binary values; and splitting, by one or more processors, a first attribute of the plurality of attributes into a plurality of second attributes, wherein each of the plurality of second attributes is used to construct an element of the feature vector. 5 . The method according to claim 4 , wherein the first attribute is a clinical attribute. 6 . The method according to claim 1 , wherein the set of data instances are patient instances. 7 . The method according to claim 2 , wherein the plurality of attributes include at least one demographic attribute and at least one clinical attribute. 8 . A system for data instance processing, the system comprising: one or more processors; a memory coupled to at least one of the one or more processors; a network interface coupling the one or more processors to processors of a plurality of organizations over a network; a set of computer program instructions stored in the memory and executed by at least one of the one or more processors in order to perform actions of: obtaining a set of data instances collected over the network from the plurality of organizations, wherein each of the data instances includes at least one record formed in an organization of the plurality of organizations that stores values of a plurality of attributes of the data instance; dividing the set of data instances into groups within the memory, wherein data instances with conflicting values for the same attribute are divided into different groups within the memory; subdividing data instances in each of the groups into clusters; and combining the data instances in each cluster into a new record having values of the plurality of attributes of the data instances in the cluster. 9 . The system according to claim 8 , wherein dividing the set of data instances into groups further includes: for each data instance in the set of data instances, forming a value sequence based on the at least one record of the data instance, wherein the value sequence includes selected attributes as its elements; constructing a conflict network in which a data instance is represented by a node and there is an edge between two nodes if two data instances represented by the two nodes have at least one conflicting element in their value sequences; and assigning labels to nodes of the conflict network with minimum labels, wherein nodes directly connected by an edge in the conflict network have different labels, and data instances represented by nodes with a same label are divided into a same group. 10 . The system according to claim 8 , wherein subdividing data instances in each of the groups into clusters further includes: constructing, for each of the data instances in each of the groups, a feature vector based on the at least one record of the data instance; calculating distances between every two feature vectors; and clustering the data instances in each of the groups based on the calculated distances. 11 . The system according to claim 10 , wherein constructing the feature vector further includes at least one of: transforming values of attributes stored in the at least one record of the data instance into binary values; and splitting a first attribute of the plurality of attributes into a plurality of second attributes, wherein each of the plurality of second attributes is used to construct an element of the feature vector. 12 . The system according to claim 11 , wherein the first attribute is a clinical attribute. 13 . The system according to claim 8 , wherein the set of data instances are patient instances. 14 . The system according to claim 9 , wherein the plurality of attributes include at least one demographic attribute and at least one clinical attribute. 15 . A computer program product for data instance processing, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the program instructions being executable by a device to cause the device to perform a method comprising: obtaining a set of data instances collected from a plurality of organizations over a network, wherein each of the data instances includes at least one record formed in an organization of the plurality of organizations that stores values of a plurality of attributes of the data instance; dividing the set of data instances into groups, wherein data instances with conflicting values for the same attribute are divided into different groups; subdividing data instances in each of the groups into clusters; and combining the data instances in each cluster into a new record having values of the plurality of attributes of the data instances in the cluster. 16 . The computer program product according to claim 15 , wherein dividing the set of data instances into groups further includes: for each data instance in the set of data instances, forming a value sequence based on the at least one record of the data instance, wherein the value sequence includes selected attributes as its elements; constructing a conflict network in which a data instance is represented by a node and there is an edge between two nodes if two da
for patient-specific data, e.g. for electronic patient records · CPC title
ICT specially adapted for medical reports, e.g. generation or transmission thereof · CPC title
Clustering or classification · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.