Using vertex self-information scores for vertices in an entity graph to determine whether to perform entity resolution on the vertices in the entity graph

US2016012151A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016012151-A1
Application numberUS-201514610557-A
CountryUS
Kind codeA1
Filing dateJan 30, 2015
Priority dateJul 9, 2014
Publication dateJan 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a computer program product, system, and method to determine whether to perform entity resolution on vertices in an entity graph. A determination is made of pairs of records in a database having a relationship value satisfying a threshold. An entity relationship graph has a vertex for each of the records of the pairs and an edge between two vertices. Each vertex has a self-information score based on content in the record, an initial unique entity identifier, and an entity information score. For each subject vertex of the vertices, a determination is made of a target vertex directly connected to the subject vertex that has a highest entity information score and whether to set the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex based on the target vertex self-information score.

First claim

Opening claim text (preview).

1 - 15 . (canceled) 16 . A method for entity resolution of records in a database, comprising: determining pairs of records in the database having a relationship value satisfying a threshold; generating an entity relationship graph having a vertex for each of the records of the pairs and an edge for each of the determined pairs between two vertices representing records in one of the determined pairs, wherein each vertex is associated with a self-information score based on content in the record represented by the vertex and is assigned an initial unique entity identifier and an entity information score, which is initially set to the information score of the vertex; for each subject vertex of the vertices, performing: determining a target vertex directly connected to the subject vertex that has a highest entity information score of at least one vertex directly connected to the subject vertex that has an entity information score greater than the entity information score of the subject vertex; and determining whether to set the subject vertex entity identifier and entity information score to the entity identifier and entity information score of the target vertex based on the target vertex self-information score. 17 . The method of claim 16 , wherein setting the entity identifier and entity information score of the subject vertex to those of the target vertex is performed in response to determining that the self-information score of the target vertex and the subject vertex entity information score satisfies a comparison criteria. 18 . The method of claim 17 , wherein the comparison criteria comprises determining whether the target vertex self-information score is less than the subject vertex entity information score minus a threshold, wherein the entity identifier and entity information score of the subject vertex is set to those of the target vertex when the target vertex self-information score is not less than the subject vertex entity information score minus the threshold. 19 . The method of claim 17 , wherein a group of vertices sharing a common entity identifier and common entity information score change their entity identifier and entity information to that of a target vertex when the target vertex comprises the best directly connected vertex having the self-information score satisfying the comparison criteria. 20 . The method of claim 16 , further comprising: sending, by the vertices having changed their entity identifier and entity information score information, a new message to each directly linked vertex on one of the edges of the vertex indicating a vertex identifier, vertex self-information score, and the updated entity identifier and the entity information score; and receiving, by each of the vertices, the new message from each directly linked vertex that changed its entity information, wherein each of the receiving vertices performs an additional iteration of the operations of determining the target vertex and determining whether to set the receiving vertex entity identifier and entity information score to the entity identifier and entity information of the sending vertex comprising the target vertex. 21 . The method of claim 16 , further comprising: sending, by each of the vertices, a message to a directly linked vertex on one of the edges of the vertex indicating a vertex identifier, vertex self-information score, the entity identifier, and the entity information score for the vertex; and receiving, by each of the vertices, the message from each directly linked vertex, wherein the receiving vertex comprises the subject vertex and the sending vertex the target vertex, wherein the setting of the entity identifier and entity information score of the receiving vertex to that of the sending vertex is based on the information in the message from the sending vertex. 22 . The computer program product of claim 16 , further comprising: initiating an unlinking procedure in response to determining that information has changed for one of the records represented by a vertex in the graph, wherein all the vertices have a common entity identifier and common entity information score; determining whether all the vertices having the common entity identifier are linked directly or indirectly to an entity vertex comprising the vertex having the information score equal to the common entity information score; and unlinking any of the vertices having the common entity identifier that are not linked directly or indirectly to the center vertex. 23 . The method of claim 22 , wherein the vertices are unlinked by assigning a new unique entity identifier to each of the vertices that are not linked directly or indirectly to the center vertex. 24 . The method of claim 22 , wherein the determining whether all the vertices having the common entity identifier are directly or indirectly linked to the entity vertex comprises: sending the entity vertex a list of all the vertices having the common entity identifier; removing, by the entity vertex, its vertex from the list; sending, by the entity vertex, a message to each directly connected vertex in the graph indicating each of the vertices that have been sent the message; replying, by each of the vertices receiving the message, to the entity vertex; removing, by the entity vertex, the replying vertices from the list; forwarding, by each of the vertices receiving the message, the message to directly connected vertices that have not already received the message to cause them to reply to the entity vertex to enable the entity vertex to remove them from the list, wherein the list resulting from the replies from all the vertices that have been forwarded the message indicates vertices having the common entity identifier that are not directly or indirectly linked to the center vertex.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016012151A1 cover?
Provided are a computer program product, system, and method to determine whether to perform entity resolution on vertices in an entity graph. A determination is made of pairs of records in a database having a relationship value satisfying a threshold. An entity relationship graph has a vertex for each of the records of the pairs and an edge between two vertices. Each vertex has a self-informati…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30958. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).