Entity explanation in data management

US12045291B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12045291-B2
Application numberUS-202217980477-A
CountryUS
Kind codeB2
Filing dateNov 3, 2022
Priority dateNov 3, 2022
Publication dateJul 23, 2024
Grant dateJul 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Records can be matched by a graph neural network model performing entity resolution on the records, and representing each record as a respective node in a graph. Record matching explanations can be generated, each record matching explanation indicating a first set of attributes, and a first set of corresponding values, used for the matching at least two of the records. Nodes can be clustered into a plurality of clusters by aggregating the record matching explanations and, based on the record matching explanations, determining which of the records have high importance values, in the first set of values, that match. At least one cluster explanation can be generated, the cluster explanation indicating a second set of attributes, and a second set of values corresponding to the second set of attributes, used for the clustering the nodes. The record matching explanation and the cluster explanation can be output.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: matching a plurality of records by a graph neural network model performing entity resolution on the plurality of records, and representing each of the plurality of records as a respective node in a graph; generating a plurality of record matching explanations, each record matching explanation indicating a first set of attributes, and a first set of values corresponding to the first set of attributes, used for the matching at least two of the plurality of records; clustering the nodes into a plurality of clusters by aggregating the plurality of record matching explanations and, based on the plurality of record matching explanations, determining which of the plurality of records have high importance values, in the first set of values, that match; generating, using a processor, at least one cluster explanation, the cluster explanation indicating a second set of attributes, and a second set of values corresponding to the second set of attributes, used for the clustering the nodes; outputting the record matching explanation and the cluster explanation; and training the graph neural network model to perform the entity resolution, the training comprising using probabilistic matching engine output data as input data to the graph neural network. 2. The method of claim 1 , further comprising: identifying at least one anomalous node or cluster in the graph; outputting data indicating the at least one anomalous node or cluster in the graph; receiving a user input indicating the at least one anomalous node or cluster to unlink from the graph; responsive to the receiving the user input indicating the at least one anomalous node or cluster to unlink from the graph, determining an impact on the graph of unlinking the at least one anomalous node or cluster from the graph by analyzing the graph without the indicated anomalous nodes and/or anomalous clusters to unlink from the graph; and outputting the determined impact on the graph of unlinking the indicated anomalous nodes and/or anomalous clusters from the graph. 3. The method of claim 1 , wherein: the generating the plurality of record matching explanations comprises assigning, to each value in the first set of values, a score indicating how significant the value is toward a determination that a pair of records match; and the determining which of the plurality of records have high importance values comprises determining values in the first set of values having highest scores. 4. The method of claim 3 , wherein the cluster explanation indicates, for each value in the second set of values, the score indicating how significant the value is toward the determination that the pair of records match. 5. The method of claim 1 , further comprising: assigning a respective score to each node of the graph based on analyzing the record represented by the node; and determining a representative record for the entity based on the scores assigned to the nodes. 6. The method of claim 1 , further comprising: generating an inter-cluster edge explanation, the inter-cluster edge explanation indicating at least one dissimilarity between at least two nodes. 7. A system, comprising: a processor programmed to initiate executable operations comprising: matching a plurality of records by a graph neural network model performing entity resolution on the plurality of records, and representing each of the plurality of records as a respective node in a graph; generating a plurality of record matching explanations, each record matching explanation indicating a first set of attributes, and a first set of values corresponding to the first set of attributes, used for the matching at least two of the plurality of records; clustering the nodes into a plurality of clusters by aggregating the plurality of record matching explanations and, based on the plurality of record matching explanations, determining which of the plurality of records have high importance values, in the first set of values, that match; generating at least one cluster explanation, the cluster explanation indicating a second set of attributes, and a second set of values corresponding to the second set of attributes, used for the clustering the nodes; outputting the record matching explanation and the cluster explanation; and training the graph neural network model to perform the entity resolution, the training comprising using probabilistic matching engine output data as input data to the graph neural network. 8. The system of claim 7 , the executable operations further comprising: identifying at least one anomalous node or cluster in the graph; outputting data indicating the at least one anomalous node or cluster in the graph; receiving a user input indicating the at least one anomalous node or cluster to unlink from the graph; responsive to the receiving the user input indicating the at least one anomalous node or cluster to unlink from the graph, determining an impact on the graph of unlinking the at least one anomalous node or cluster from the graph by analyzing the graph without the indicated anomalous nodes and/or anomalous clusters to unlink from the graph; and outputting the determined impact on the graph of unlinking the indicated anomalous nodes and/or anomalous clusters from the graph. 9. The system of claim 7 , wherein: the generating the plurality of record matching explanations comprises assigning, to each value in the first set of values, a score indicating how significant the value is toward a determination that a pair of records match; and the determining which of the plurality of records have high importance values comprises determining values in the first set of values having highest scores. 10. The system of claim 9 , wherein the cluster explanation indicates, for each value in the second set of values, the score indicating how significant the value is toward the determination that the pair of records match. 11. The system of claim 7 , the executable operations further comprising: assigning a respective score to each node of the graph based on analyzing the record represented by the node; and determining a representative record for the entity based on the scores assigned to the nodes. 12. The system of claim 7 , the executable operations further comprising: generating an inter-cluster edge explanation, the inter-cluster edge explanation indicating at least one dissimilarity between at least two nodes. 13. A computer program product, comprising: one or more computer readable storage mediums having program code stored thereon, the program code stored on the one or more computer readable storage mediums collectively executable by a data processing system to initiate operations including: matching a plurality of records by a graph neural network model performing entity resolution on the plurality of records, and representing each of the plurality of records as a respective node in a graph; generating a plurality of record matching explanations, each record matching explanation indicating a first set of attributes, and a first set of values corresponding to the first set of attributes, used for the matching at least two of the plurality of records; clustering the nodes into a plurality of clusters by aggregating the plurality of record matching explanations and, based on the plurality of record matching explanations, determining which of the plurality of records have high importance values, in the first set of values, that match; generating at least one cluster explanation, the cluster explanation indicating a second set of attributes, and a second set of values corresponding to

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Machine learning · CPC title

  • G06F16/906Primary

    Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12045291B2 cover?
Records can be matched by a graph neural network model performing entity resolution on the records, and representing each record as a respective node in a graph. Record matching explanations can be generated, each record matching explanation indicating a first set of attributes, and a first set of corresponding values, used for the matching at least two of the records. Nodes can be clustered in…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/906. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).