Systems, methods, and software for entity relationship resolution

US9600509B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9600509-B2
Application numberUS-34191308-A
CountryUS
Kind codeB2
Filing dateDec 22, 2008
Priority dateDec 21, 2007
Publication dateMar 21, 2017
Grant dateMar 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

To facilitate access to public records, the present inventors devised, among other things, an entity resolution system. The exemplary system includes master records database of 300 million entities, which is partitioned into multiple distinct portions. The exemplary system extracts entity information from input public records and constructs one or more blocking queries against specific portions of the master records database to identify one or more sets of candidate records. Feature vectors are defined for the candidate records and machine learning techniques, such as Support Vector Machine, are used to determine which of the candidate records from the master records database match the input public records. Candidate records that match are logically associated with public records, enabling ready access via direct or indirect queries.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more processors; an entity resolution database (“ERD”) resolution engine adapted to retrieve, responsive to a first set of data in one or more data fields in a public record, a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries, wherein each blocking query in the set of two or more blocking queries comprises a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record; the ERD resolution engine further adapted to automatically determine a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries based on the first set of data; the ERD resolution engine further adapted to calculate similarity scores for the first set of data in the one or more of the data fields in the public record and a second set of data in a set of data fields in the set of candidate named entity records by comparing the second set of data in the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the first set of data in the one or more data fields in the public record; and the ERD resolution engine further adapted to determine a confidence rating for one or more of the set of similarity scores between the public record and the candidate named entity record. 2. The system of claim 1 , wherein the ERD resolution engine is further adapted to, responsive to the confidence rating, determine whether to retrieve another set of candidate named entity records from the master named entity database based on another of the set of two or more blocking queries. 3. The system of claim 2 , wherein the other of the set of two or more blocking queries is broader in scope that the one blocking query. 4. The system of claim 1 , wherein the set of blocking queries includes: a query for a social security number from the public record; a query for a last name and a first name, and a city name, all extracted from the public record; and a query for a last name and a first name, all from the public record. 5. The system of claim 4 , wherein the system is implemented as a client-server architecture and one or more of the processors is a component of a web server and wherein one or more client access devices interface with the web server via a wide or local area network to request and receive public record information. 6. The system of claim 1 wherein the master named entity database is partitioned into a number of blocks based on corresponding hashes of a name field associated with each record in the master named entity database. 7. The system of claim 1 wherein each similarity score ranges from 0 and 1.0, wherein 0 indicates a non-match and 1.0 indicates an identical match. 8. The system of claim 1 further comprising a lookup table for determining whether one or more of the blocking queries will return a number of candidate named entity records in excess of a threshold. 9. The system of claim 1 , wherein one or more of the recited means is implemented using in combination machine-executable instruction sets stored on a machine-readable magnetic, electrical, or optical medium, with the instruction sets executed using one or more processors. 10. A method comprising: retrieving a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries, with each blocking query based on one or more data fields in a public record, and wherein each blocking query comprises a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record, and wherein a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries is automatically determined based on the one or more data fields in the public record; calculating similarity scores for one or more of the data fields in the public record and a set of data fields in the set of candidate named entity records by comparing the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the one or more data fields in the public record; and determining a confidence rating for one or more of the set of similarity scores between the public record and the candidate named entity record. 11. The method of claim 10 , further comprising: determining whether to retrieve another set of candidate named entity records from the master named entity database based on another of the set of two or more blocking queries. 12. The method of claim 11 , wherein the other of the set of two or more blocking queries is broader in scope that the one blocking query. 13. The method of claim 10 , wherein the set of blocking queries includes: a query for a social security number extracted from the public record; a query for a last name and a first name, and a city name, all extracted from the public record; and a query for a last name and a first name, all extracted from the public record. 14. The method of claim 10 wherein the master named entity database is partitioned into a number of blocks based on corresponding hashes of a name field associated with each record in the master named entity database. 15. The method of claim 10 wherein each similarity score ranges from 0 and 1.0, wherein 0 indicates a non-match and 1.0 indicates an identical match. 16. The method of claim 10 further comprising: using a lookup table to determine whether the one of the blocking queries will return a number of candidate named entity records in excess of a threshold. 17. An entity resolution system comprising: a computer based system comprising an input adapted to receive user-defined inputs, a processor adapted to process executable code and user-defined inputs and a memory adapted to store the executable code and user-defined inputs, the executable code comprising: a retrieval code set stored on the memory, when executed by the processor, being responsive to a first set of data in one or more data fields in a public record and adapted to retrieve a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries, wherein each blocking query in the set of two or more blocking queries includes a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record; the retrieval set of code further adapted to automatically determine a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries based on the first set of data; a matching code set stored on the memory and being adapted to, when executed by the processor, calculate similarity scores for the first set of data in the one or more of the data fields in the public record and a second set of data from a set of data fields in the set of candidate named entity records by comparing the second set of data from the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the first set of data from the one or more data fields in the public record; and a confidence code set stored on the memory and being

Assignees

Inventors

Classifications

  • G06F16/23Primary

    Updating · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9600509B2 cover?
To facilitate access to public records, the present inventors devised, among other things, an entity resolution system. The exemplary system includes master records database of 300 million entities, which is partitioned into multiple distinct portions. The exemplary system extracts entity information from input public records and constructs one or more blocking queries against specific portions…
Who is the assignee on this patent?
Conrad Jack G, Dozier Christopher C, Veeramachaneni Sriharsha, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/23. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).