Entity resolution techniques for matching entity records from different data sources

US11080272B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11080272-B2
Application numberUS-201916457666-A
CountryUS
Kind codeB2
Filing dateJun 28, 2019
Priority dateJun 28, 2019
Publication dateAug 3, 2021
Grant dateAug 3, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Entity resolution techniques for matching entity records from different data sources are provided. In one technique, an entity record from a source database is identified along with multiple data items included therein. Each data item corresponds to an attribute of multiple source attributes. For one of the data items that corresponds to a first source attribute, multiple target attributes are identified. A first query is generated that includes the data items and associates the data item with each of the multiple target attributes. A second query that is different than the first query is also generated. Two searches are performed of a target database: one based on the first query and the other based on the second query. A scoring model generates multiple scores, one for each search result. It is determined whether the entity record matches an entity record in the target database based on the set of scores.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying a first entity record from a source database; identifying, in the first entity record, a plurality of data items, each of which corresponds to an attribute of a plurality of source attributes; for a first data item, of the plurality of data items, that corresponds to a first source attribute of the plurality of source attributes, identifying a plurality of target attributes; generating a first query that includes the plurality of data items and that associates the first data item with each of the plurality of target attributes; generating a second query that is different than the first query; performing a first search of a target database based on the first query, wherein the first search results in a first set of results; performing a second search of the target database based on the second query, wherein the second search results in a second set of results; using a scoring model to generate a set of scores, one score for each result in the first and second sets of results; determining whether the first entity record matches an entity record in the target database based on the set of scores generated by the scoring model; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , further comprising: for a second data item, of the plurality of data items, that corresponds to a second source attribute of the plurality of source attributes, identifying a second plurality of target attributes; generating a first plurality of predicates, one for each target attribute of the second plurality of target attributes, wherein each predicate in the first plurality of predicates includes the second data item and is combined with each other predicate of first plurality of predicates with a disjunctive OR; for a third data item, of the plurality of data items, that corresponds to a third source attribute of the plurality of source attributes, identifying a third plurality of target attributes; generating a second plurality of predicates, one for each target attribute of the third plurality of target attributes, wherein each predicate in the second plurality of predicates includes the third data item and is combined with each other predicate of second plurality of predicates with the disjunctive OR; wherein the second query includes the first plurality of predicates and the second plurality of predicates, wherein the first plurality of predicates and the second plurality of predicates are combined with the disjunctive OR. 3. The method of claim 1 , wherein: generating the first query that includes generating a plurality of predicates, each predicate corresponding to a different data item-target attribute pair and including the first data item and a different target attribute of the plurality of target attributes; the plurality of predicates are combined with a disjunctive OR. 4. The method of claim 1 , further comprising: for a second data item, of the plurality of data items, that corresponds to a second source attribute of the plurality of source attributes, identifying a second plurality of target attributes; wherein the first query associates the second data item with each of the second plurality of target attributes. 5. The method of claim 4 , wherein: the first query includes a first compound predicate that associates the first data item with each of the plurality of target attributes and a second compound predicate that associates the second data item with each of the second plurality of target attributes; the first compound predicate and the second compound predicate are combined using a conjunctive AND. 6. The method of claim 4 , wherein: the plurality of target attributes is a first plurality of target attributes; the first plurality of target attributes include first name and last name; the first plurality of target attributes is the same as the second plurality of target attributes; the first source attribute is first name and the second source attribute is last name. 7. The method of claim 1 , wherein: the first source attribute is job title; the plurality of target attributes include previous job title and current job title. 8. The method of claim 1 , wherein: the first source attribute is organization name; the plurality of target attributes include previous organization name and current organization name. 9. The method of claim 8 , further comprising: for a second data item, of the plurality of data items, that corresponds to a second source attribute of the plurality of source attributes, identifying a second plurality of target attributes; wherein the first query associates the second data item with each of the second plurality of target attributes; wherein the second source attribute is job title; the second plurality of target attributes include previous job title and current job title. 10. A method comprising: identifying a first entity record from a source database; identifying, in the first entity record, a plurality of data items, each of which corresponds to an attribute of a plurality of source attributes; for a first data item, of the plurality of data items, that corresponds to a first source attribute of the plurality of source attributes, identifying a plurality of target attributes; generating a first query that includes a plurality of predicates, each predicate corresponding to a different data item-target attribute pair and including the first data item and a different target attribute of the plurality of target attributes; wherein the plurality of predicates are combined with a disjunctive OR; performing a first search of a target database based on the first query, wherein the first search results in a first set of results; using a scoring model to generate a set of scores, one for each result in the first set of results; determining whether the first entity record matches an entity record in the target database based on the set of scores generated by the scoring model; generating a second query that is different than the first query; performing a second search of the target database based on the second query, wherein the second search results in a second set of results; wherein using the scoring model comprises using the scoring model to generate a score for each result in the first and second sets of results; wherein the method is performed by one or more computing devices. 11. One or more storage media storing instructions which, when executed by one or more processors, cause: identifying a first entity record from a source database; identifying, in the first entity record, a plurality of data items, each of which corresponds to an attribute of a plurality of source attributes; for a first data item, of the plurality of data items, that corresponds to a first source attribute of the plurality of source attributes, identifying a plurality of target attributes; generating a first query that includes the plurality of data items and that associates the first data item with each of the plurality of target attributes; generating a second query that is different than the first query; performing a first search of a target database based on the first query, wherein the first search results in a first set of results; performing a second search of the target database based on the second query, wherein the second search results in a second set of results; using a scoring model to generate a set of scores, one score for each result in the first and second sets of results; determining whether the first entity record matches an entity record in the target database based on the set of scores generated by the scoring model

Assignees

Inventors

Classifications

  • Interactive query statement specification based on a database schema · CPC title

  • Iterative querying; Query formulation based on the results of a preceding query · CPC title

  • using ranking · CPC title

  • Integrating or interfacing systems involving database management systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11080272B2 cover?
Entity resolution techniques for matching entity records from different data sources are provided. In one technique, an entity record from a source database is identified along with multiple data items included therein. Each data item corresponds to an attribute of multiple source attributes. For one of the data items that corresponds to a first source attribute, multiple target attributes are …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2423. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 03 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).