What technology area does this patent fall under?

Primary CPC classification G16H10/60. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 26 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Cross-organization data instance matching

US2020098453A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2020098453-A1
Application number	US-201816139678-A
Country	US
Kind code	A1
Filing date	Sep 24, 2018
Priority date	Sep 24, 2018
Publication date	Mar 26, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosure provides a method for data instance processing. The method includes obtaining a set of data instances collected from a plurality of organizations. Each of the data instances includes at least one record formed in an organization that stores values of a plurality of attributes of the data instance. The method also includes dividing the set of data instances into groups, wherein data instances with conflicting values for the same attribute are divided into different groups. The method further includes subdividing data instances in each of the groups into clusters.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for data instance processing, the method comprising: obtaining, by one or more processors, a set of data instances collected from a plurality of organizations over a network, wherein each of the data instances includes at least one record formed in an organization that stores values of a plurality of attributes of the data instance; dividing, by one or more processors, the set of data instances into groups, wherein data instances with conflicting values for the same attribute are divided into different groups; and subdividing, by one or more processors, data instances in each of the groups into clusters combining the data instances in each cluster into a new record having values of the plurality of attributes of the data instances in the cluster. 2 . The method according to claim 1 , wherein dividing the set of data instances into groups further includes: for each data instance in the set of data instances, forming, by one or more processors, a value sequence based on the at least one record of the data instance, wherein the value sequence includes selected attributes as its elements; constructing, by one or more processors, a conflict network in which a data instance is represented by a node and there is an edge between two nodes if two data instances represented by the two nodes have at least one conflicting element in their value sequences; and assigning, by one or more processors, labels to nodes of the conflict network with minimum labels, wherein nodes directly connected by an edge in the conflict network have different labels, and data instances represented by nodes with a same label are divided into a same group. 3 . The method according to claim 1 , wherein subdividing data instances in each of the groups into clusters further includes: constructing, by one or more processors, for each of the data instances in each of the groups, a feature vector based on the at least one record of the data instance; calculating, by one or more processors, distances between every two feature vectors; and clustering, by one or more processors, the data instances in each of the groups based on the calculated distances. 4 . The method according to claim 3 , wherein constructing the feature vector further includes at least one of: transforming, by one or more processors, values of attributes stored in the at least one record of the data instance into binary values; and splitting, by one or more processors, a first attribute of the plurality of attributes into a plurality of second attributes, wherein each of the plurality of second attributes is used to construct an element of the feature vector. 5 . The method according to claim 4 , wherein the first attribute is a clinical attribute. 6 . The method according to claim 1 , wherein the set of data instances are patient instances. 7 . The method according to claim 2 , wherein the plurality of attributes include at least one demographic attribute and at least one clinical attribute. 8 . A system for data instance processing, the system comprising: one or more processors; a memory coupled to at least one of the one or more processors; a network interface coupling the one or more processors to processors of a plurality of organizations over a network; a set of computer program instructions stored in the memory and executed by at least one of the one or more processors in order to perform actions of: obtaining a set of data instances collected over the network from the plurality of organizations, wherein each of the data instances includes at least one record formed in an organization of the plurality of organizations that stores values of a plurality of attributes of the data instance; dividing the set of data instances into groups within the memory, wherein data instances with conflicting values for the same attribute are divided into different groups within the memory; subdividing data instances in each of the groups into clusters; and combining the data instances in each cluster into a new record having values of the plurality of attributes of the data instances in the cluster. 9 . The system according to claim 8 , wherein dividing the set of data instances into groups further includes: for each data instance in the set of data instances, forming a value sequence based on the at least one record of the data instance, wherein the value sequence includes selected attributes as its elements; constructing a conflict network in which a data instance is represented by a node and there is an edge between two nodes if two data instances represented by the two nodes have at least one conflicting element in their value sequences; and assigning labels to nodes of the conflict network with minimum labels, wherein nodes directly connected by an edge in the conflict network have different labels, and data instances represented by nodes with a same label are divided into a same group. 10 . The system according to claim 8 , wherein subdividing data instances in each of the groups into clusters further includes: constructing, for each of the data instances in each of the groups, a feature vector based on the at least one record of the data instance; calculating distances between every two feature vectors; and clustering the data instances in each of the groups based on the calculated distances. 11 . The system according to claim 10 , wherein constructing the feature vector further includes at least one of: transforming values of attributes stored in the at least one record of the data instance into binary values; and splitting a first attribute of the plurality of attributes into a plurality of second attributes, wherein each of the plurality of second attributes is used to construct an element of the feature vector. 12 . The system according to claim 11 , wherein the first attribute is a clinical attribute. 13 . The system according to claim 8 , wherein the set of data instances are patient instances. 14 . The system according to claim 9 , wherein the plurality of attributes include at least one demographic attribute and at least one clinical attribute. 15 . A computer program product for data instance processing, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the program instructions being executable by a device to cause the device to perform a method comprising: obtaining a set of data instances collected from a plurality of organizations over a network, wherein each of the data instances includes at least one record formed in an organization of the plurality of organizations that stores values of a plurality of attributes of the data instance; dividing the set of data instances into groups, wherein data instances with conflicting values for the same attribute are divided into different groups; subdividing data instances in each of the groups into clusters; and combining the data instances in each cluster into a new record having values of the plurality of attributes of the data instances in the cluster. 16 . The computer program product according to claim 15 , wherein dividing the set of data instances into groups further includes: for each data instance in the set of data instances, forming a value sequence based on the at least one record of the data instance, wherein the value sequence includes selected attributes as its elements; constructing a conflict network in which a data instance is represented by a node and there is an edge between two nodes if two da

Assignees

Inventors

Classifications

G16H10/60Primary
for patient-specific data, e.g. for electronic patient records · CPC title
G16H15/00
ICT specially adapted for medical reports, e.g. generation or transmission thereof · CPC title
G06F16/285
Clustering or classification · CPC title
G06F17/30598
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 69885652

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020098453A1 cover?: The disclosure provides a method for data instance processing. The method includes obtaining a set of data instances collected from a plurality of organizations. Each of the data instances includes at least one record formed in an organization that stores values of a plurality of attributes of the data instance. The method also includes dividing the set of data instances into groups, wherein da…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G16H10/60. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 26 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).