Record matching in a database system

US11687574B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11687574-B2
Application numberUS-202117215071-A
CountryUS
Kind codeB2
Filing dateMar 29, 2021
Priority dateMar 29, 2021
Publication dateJun 27, 2023
Grant dateJun 27, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer implemented method comprising processing the unstructured objects of each record of records of a database for identifying a set of one or more values of attributes in the unstructured objects of the each record. The sets of unstructured attribute values of two records of the database may be compared for determining a similarity level between the two sets. It may be determined whether the two records are representing a same entity based on the comparison result.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for record matching in a database system, the method comprising: identifying records representing respective entities, wherein a record of the identified records comprises structured attributes; assigning an initial contribution weight to the structured attributes; identifying one or more unstructured data objects corresponding to the records; processing the one or more unstructured data objects to identify unstructured attribute values corresponding to respective records of the identified records; identifying entity relation scores corresponding to the identified records, wherein an entity relation score indicates how often an entity represented by a record occurs alongside a selected entity; comparing two records based, at least in part, on the updated contribution weight of the selected structured attribute and a comparison of the entity relation scores and the unstructured attribute values of the two records to determine a similarity level between the two records; selecting unstructured attribute values that are present with respect to the identified records; and responsive to determining a structured attribute value of the structured attribute values does not match any of the selected unstructured attributes, replacing the contribution weight of said structured attribute by an updated contribution weight indicative of the similarity between the two records of the identified records. 2. The method of claim 1 , further comprising: evaluating one or more occurrence properties corresponding to the identified unstructured attribute values, wherein the occurrence property of a specific unstructured attribute value identified in the unstructured object(s) of a specific record is selected from the group consisting of: a frequency of occurrence of the specific unstructured attribute value in the unstructured data objects of the specific record, and an indication of other identified unstructured attribute values for that specific record which are collocated with the specific unstructured attribute value in the unstructured data objects, wherein comparing two sets of unstructured attribute values corresponding to the two records comprises comparing the evaluated occurrence properties of the unstructured attribute values of one set of the two sets with the evaluated occurrence properties of the unstructured attribute values of the other set of the two sets. 3. The method of claim 1 , further comprising grouping the unstructured attribute values into groups based on their category, wherein the comparison between the two sets is performed by comparing groups of the same category. 4. The method of claim 1 , further comprising: responsive to determining a selected structured attribute value matches at least one unstructured attribute value present with respect to each of two records, increasing the initial contribution weight of a selected structured attribute associated with the selected structured attribute value. 5. The method of claim 4 , wherein the selecting comprises intersecting the two sets resulting in an intersection set. 6. The method of claim 5 , wherein the unstructured attribute value is a portion of a full value of the unstructured attribute, the selecting further comprising executing an aggregation algorithm for aggregating values of the selected unstructured attribute values that form a full value of the respective unstructured attribute, resulting in zero or more aggregated values, wherein the comparison with the structured attribute values is performed with the processed selected unstructured attribute values. 7. The method of claim 2 , wherein the aggregating comprises grouping the unstructured attribute values into groups based on their category, wherein the aggregation is performed for values belonging to the same group. 8. The method of claim 2 , wherein the selected unstructured attribute values are present with a same occurrence frequency in each of the two sets. 9. The method of claim 6 , wherein comparing the two records comprises: comparing the values of the structured attributes of the two records resulting in an individual matching score per structured attribute of the record, combining the individual matching scores using the contribution weights and comparing the combined score with a predefined threshold. 10. The method of claim 1 , further comprising merging the two records in a single record wherein the two records represent a same entity. 11. The method of claim 1 , occurring responsive to receiving a respective request for matching the records. 12. The method of claim 1 , further comprising repeating the method for comparing further records of the database until all records of the database are compared. 13. The method of claim 1 , being performed by a master data management (MDM) system, wherein the compared records are MDM records, wherein the processing of the one or more unstructured data objects is performed by an entity detection module of the master data management system. 14. The method of claim 1 , wherein the unstructured data objects correspond to documents. 15. The method of claim 14 , wherein the unstructured data objects correspond to scanned documents. 16. The method of claim 1 , further comprising providing to a person associated with the compared records information indicative of the unstructured data objects associated with the two records. 17. The method of claim 1 , occurring responsive to storing the compared records. 18. The method of claim 5 , wherein the intersection set comprises the selected unstructured attribute values. 19. A computer program product for record matching in a database system, the computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising instructions to: identify records representing respective entities wherein a record of the identified records comprises structured attributes; identify one or more unstructured data objects corresponding to the records; process the one or more unstructured data objects to identify unstructured attribute values corresponding to respective records of the identified records; identify entity relation scores corresponding to the identified records, wherein the entity relation score indicates how often an entity represented by a record occurs alongside a selected entity; compare two records based, at least in part, on the increased initial contribution weight of the selected structured attribute and a comparison of the entity relation scores and the unstructured attribute values of the two records to determine a similarity level between the two records; select unstructured attribute values that are present with respect to the identified records; and responsive to determining a structured attribute value of the structured attribute values does not match any of the selected unstructured attributes, replace the contribution weight of said structured attribute by an updated contribution weight indicative of the similarity between the two records of the identified records. 20. A computer system for record matching, wherein a record represents an entity, the record being associated with one or more unstructured data objects, the computer system comprising: one or more computer processors: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the

Assignees

Inventors

Classifications

  • Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title

  • Query execution (filtering based on additional data G06F16/335) · CPC title

  • Presentation of query results · CPC title

  • using system suggestions (G06F16/3325 takes precedence) · CPC title

  • G06F16/219Primary

    Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11687574B2 cover?
A computer implemented method comprising processing the unstructured objects of each record of records of a database for identifying a set of one or more values of attributes in the unstructured objects of the each record. The sets of unstructured attribute values of two records of the database may be compared for determining a similarity level between the two sets. It may be determined whether…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/3322. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).