Data matching accuracy based on context features

US11074230B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11074230-B2
Application numberUS-201816120949-A
CountryUS
Kind codeB2
Filing dateSep 4, 2018
Priority dateSep 4, 2018
Publication dateJul 27, 2021
Grant dateJul 27, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is provided for data matching between a set of source data structures and a set of target data structures. The method includes obtaining, using a processor device configured to perform machine learning, source to target matching results with matching scores, based on the sets of source and target data structures. The method further includes calculating, by the processor device, context information for data structure pairs based on a structure similarity and an ontology similarity between constituent data structures thereof. Each of data structure pairs include as the constituent data structures a respective source data structure and a respective target data structure from the sets of source and target data structures. The method also includes updating, by the processor device, the matching scores based on the context information. The method additionally includes controlling, by the processor device, a hardware device responsive to at least one updated matching score.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for data matching between a set of source data structures and a set of target data structures, the method comprising: obtaining, using a processor device configured to perform machine learning, source to target matching results with matching scores, based on the sets of source and target data structures; calculating, by the processor device, context information for data structure pairs based on a structure similarity and an ontology similarity between constituent data structures thereof, each of data structure pairs comprising as the constituent data structures a respective source data structure and a respective target data structure from the sets of source and target data structures; updating, by the processor device, the matching scores based on the context information; controlling, by the processor device, a hardware device responsive to at least one of the updated matching scores; and wherein the method further comprises determining the structure similarity by finding all neighborhood mappings from an initial mapping and computing an aggregation score for each of the data structure pairs, and wherein the matching scores for the data structure pairs having the aggregation score greater than a threshold are increased by said updating step. 2. The computer-implemented method of claim 1 , wherein for a relational database, the context information based on the structure similarity is, in turn, based on a respective source table and a respective target table and columns in the respective source and target tables. 3. The computer-implemented method of claim 2 , wherein the structure similarity score for a given data structure pair increases responsive to most of the columns matching between the respective source and target tables that comprise the given data structure pair, and wherein the structure similarity score for a column pair formed from a respective column in each of the respective source and target tables increases responsive to the structure similarity score for the respective source and target tables that include the column pair being greater than a threshold. 4. The computer-implemented method of claim 2 , wherein the context information based on the source structure is calculated, in turn, based on a first premise that the respective source and target tables are considered to match responsive to most of the columns therebetween matching, and a second premise that at least one column in each of the respective source and target tables are considered to match responsive to the respective source and target tables being considered a match. 5. The computer-implemented method of claim 1 , wherein for extensible markup language, the context information based on the structure similarity is, in turn, based on layered elements and attributes of the layered elements. 6. The computer-implemented method of claim 1 , wherein the context information based on the ontology structure is calculated, in turn, based on a premise that a particular element of a data structure is likely to match an ontology target responsive to a neighboring element of the particular element matching a related ontology target with respect to the ontology target. 7. The computer-implemented method of claim 1 , further comprising determining the ontology similarity by finding all neighborhood mappings having an ontology relationship from an initial mapping and computing an ontology distance between members of the data structure pairs, wherein the matching scores for the data structure pairs having the ontology distance greater than a threshold are increased by said updating step. 8. The computer-implemented method of claim 1 , wherein the method is performed by a server comprised in a cloud computing platform, and the processor device is comprised in the server. 9. A computer program product for data matching between a set of source data structures and a set of target data structures, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: obtaining, using a processor device configured to perform machine learning, source to target matching results with matching scores, based on the sets of source and target data structures; calculating, by the processor device, context information for data structure pairs based on a structure similarity and an ontology similarity between constituent data structures thereof, each of data structure pairs comprising as the constituent data structures a respective source data structure and a respective target data structure from the sets of source and target data structures; updating, by the processor device, the matching scores based on the context information; controlling, by the processor device, a hardware device responsive to at least one of the updated matching scores; and wherein the method further comprises determining the structure similarity by finding all neighborhood mappings from an initial mapping and computing an aggregation score for each of the data structure pairs, and wherein the matching scores for the data structure pairs having the aggregation score greater than a threshold are increased by said updating step. 10. The computer program product of claim 9 , wherein for a relational database, the context information based on the structure similarity is, in turn, based on a respective source table and a respective target table and columns in the respective source and target tables. 11. The computer program product of claim 10 , wherein the structure similarity score for a given data structure pair increases responsive to most of the columns matching between the respective source and target tables that comprise the given data structure pair, and wherein the structure similarity score for a column pair formed from a respective column in each of the respective source and target tables increases responsive to the structure similarity score for the respective source and target tables that include the column pair being greater than a threshold. 12. The computer program product of claim 10 , wherein the context information based on the source structure is calculated, in turn, based on a first premise that the respective source and target tables are considered to match responsive to most of the columns therebetween matching, and a second premise that at least one column in each of the respective source and target tables are considered to match responsive to the respective source and target tables being considered a match. 13. The computer program product of claim 9 , wherein for extensible markup language, the context information based on the structure similarity is, in turn, based on layered elements and attributes of the layered elements. 14. The computer program product of claim 9 , wherein the context information based on the ontology structure is calculated, in turn, based on a premise that a particular element of a data structure is likely to match an ontology target responsive to a neighboring element of the particular element matching a related ontology target with respect to the ontology target. 15. The computer program product of claim 9 , wherein the method further comprises determining the ontology similarity by finding all neighborhood mappings having an ontology relationship from an initial mapping and computing an ontology distance between members of the data structure pairs, wherein the matching scores for the data structure pairs having the ontology distance greater t

Assignees

Inventors

Classifications

  • G06F16/211Primary

    Schema design and management · CPC title

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11074230B2 cover?
A method is provided for data matching between a set of source data structures and a set of target data structures. The method includes obtaining, using a processor device configured to perform machine learning, source to target matching results with matching scores, based on the sets of source and target data structures. The method further includes calculating, by the processor device, context…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/211. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 27 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).