Identifying entity mappings across data assets

US10120930B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10120930-B2
Application numberUS-201615268400-A
CountryUS
Kind codeB2
Filing dateSep 16, 2016
Priority dateSep 14, 2015
Publication dateNov 6, 2018
Grant dateNov 6, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Entity mappings that produce matching entities for a first data asset having attributes and a second data asset having attributes are generated by: generating entity mappings that produce matching entities for a first data asset having attributes with attribute values and a second data asset having attributes with attribute values by: matching the attribute values of the attributes of the first data asset with the attribute values of the attributes of the second data asset, using the matching attribute values to generate matching attribute pairs, and using the matching attribute pairs to identify entity mappings; computing an entity mapping score for each of the entity mappings based on a combination of factors; ranking the entity mappings based on each entity mapping score; and using some of the ranked entity mappings to determine whether a same real-world entity is described by the first data asset and the second data asset.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: generating entity mappings that produce matching entities for a first data asset having attributes with attribute values and a second data asset having attributes with attribute values by: matching the attribute values of the attributes of the first data asset with the attribute values of the attributes of the second data asset; using the matching attribute values to generate matching attribute pairs; and using the matching attribute pairs to identify entity mappings; computing an entity mapping score for each of the entity mappings based on a combination of factors; ranking the entity mappings based on each entity mapping score; and using the ranked entity mappings to determine which of the entity mappings are to be used to determine whether a same real-world entity is described by the first data asset and the second data asset. 2. The method of claim 1 , further comprising: generating a first inverted index of entity identifier pairs for the first data asset; generating a second inverted index of entity identifier pairs for the second data asset; and using the first inverted index and the second inverted index to generate the matching attribute pairs based on matching attribute values that form the entity mappings. 3. The method of claim 1 , wherein values match fuzzily for the matching entities. 4. The method of claim 1 , wherein computing the entity mapping score for each of the entity mappings comprises: generating an entity mapping score for factors selected from: a number of attributes involved in an entity mapping, a cardinality of that individual entity mapping, support of that entity mapping, a probability of one to one matching for that entity mapping, a join utility measure for that entity mapping, and a probability of previous user selections for that entity mapping; and adding the entity mapping score for each of the factors to generate the entity mapping score for that entity mapping. 5. The method of claim 1 , wherein one of the first data asset and the second data asset is semi-structured data having hierarchical data that is flattened. 6. The method of claim 1 , wherein one of the first data asset and the second data asset is an unstructured data asset formed by a collection of documents and is modelled based one of a bag of words and annotated words. 7. The method of claim 1 , further comprising: integrating the first data asset and the second data asset using ranked entity mappings by performing one of a join operation, a merge operation, and a union operation. 8. The method of claim 1 , wherein Software as a Service (SaaS) is configured to perform method operations.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10120930B2 cover?
Entity mappings that produce matching entities for a first data asset having attributes and a second data asset having attributes are generated by: generating entity mappings that produce matching entities for a first data asset having attributes with attribute values and a second data asset having attributes with attribute values by: matching the attribute values of the attributes of the first…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30604. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).