Qualification of match results

US9805072B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9805072-B2
Application numberUS-201414222445-A
CountryUS
Kind codeB2
Filing dateMar 21, 2014
Priority dateMar 21, 2014
Publication dateOct 31, 2017
Grant dateOct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Qualification of data matches can be improved relative to existing approaches by use of first, second, and third similarity criteria, which can be used to identify a set of near match record pairs, identify a set of actual match record pairs and to flag as near matches those record pairs of the set of near matches that were identified as actual matches, and to identify and flag one or more suspect matches.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising a non-transitory machine-readable medium storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising: applying first similarity criteria to a set of record pairs to identify a set of near match record pairs, the applying comprising: identifying a pair of records in the set of record pairs as a near match record pair when: a similarity between the pair of records exceeds a minimum similarity threshold and when the pair of records is a name, the pair of records contains an email address comprising a domain name that is the same in the pair of records, the only difference between the pair of records is a numeric component, the pair of records match when the pair of records is translated to a same script, and adding, the near match record pair to the set of near match record pairs; applying, after identifying the set of near match record pairs, second similarity criteria to the set of near match record pairs to identify a set of actual match record pairs, the second similarity criteria being more strict than the first similarity criteria; flagging, as near matches, those record pairs of the set of near matches that were not identified as actual matches; applying, after identifying the set of actual match record pairs, third similarity criteria to the set of actual match record pairs to identify and flag one or more suspect matches, the third similarity criteria being more strict than the second similarity criteria; flagging, as a suspect match group, the set of actual match record pairs that contain the one or more suspect matches; and providing at least one of the flagged near matches, the flagged suspect matches, and the flagged suspect match group for further analysis. 2. A computer program product as in claim 1 , wherein the operations further comprise routing the set of actual matches for automatic application of one or more post match processes. 3. A computer program product as in claim 2 , wherein the one or more post match processes comprise at least one of consolidation of data, formation of a best record, and disregarding one or more duplicate records. 4. A computer program product as in claim 1 , wherein the providing comprises presenting a user interface displaying at least some of the flagged near matches and/or the flagged suspect matches and supporting functionality that allows a user to perform fine-tuning of one or more of the first criteria, the second criteria, and the third criteria. 5. A computer program product as in claim 4 , wherein the user interface further comprises an auto-generated suggestion feature that proposes changes to one or more of the first criteria, the second criteria, and the third criteria. 6. A computer program product as in claim 4 , wherein the user interface further comprises functionality for illustrating an impact of a change to one or more of the first criteria, the second criteria, and the third criteria on classification of the record pairs. 7. A system comprising: computer hardware configured to provide operations comprising: applying first similarity criteria to a set of record pairs to identify a set of near match record pairs, the applying comprising: identifying a pair of records in the set of record pairs as a near match record pair when: a similarity between the pair of records exceeds a minimum similarity threshold and when the pair of records is a name, the pair of records contains an email address comprising a domain name that is the same in the pair of records, the only difference between the pair of records is a numeric component, the pair of records match when the pair of records is translated to a same script, and adding, the near match record pair to the set of near match record pairs; applying, after identifying the set of near match record pairs, second similarity criteria to the set of near match record pairs to identify a set of actual match record pairs, the second similarity criteria being more strict than the first similarity criteria; flagging, as near matches, those record pairs of the set of near matches that were not identified as actual matches; applying, after identifying the set of actual match record pairs, third similarity criteria to the set of actual match record pairs to identify and flag one or more suspect matches, the third similarity criteria being more strict than the second similarity criteria; flagging, as a suspect match group, the set of actual match record pairs that contain the one or more suspect matches; and providing at least one of the flagged near matches, the flagged suspect matches, and the flagged suspect match group for further analysis. 8. A system as in claim 7 , wherein the operations further comprise routing the set of actual matches for automatic application of one or more post match processes. 9. A system as in claim 8 , wherein the one or more post match processes comprise at least one of consolidation of data, formation of a best record, and disregarding one or more duplicate records. 10. A system as in claim 7 , wherein the providing comprises presenting a user interface displaying at least some of the flagged near matches and/or the flagged suspect matches and supporting functionality that allows a user to perform fine-tuning of one or more of the first criteria, the second criteria, and the third criteria. 11. A system as in claim 10 , wherein the user interface further comprises an auto-generated suggestion feature that proposes changes to one or more of the first criteria, the second criteria, and the third criteria. 12. A system as in claim 10 , wherein the user interface further comprises functionality for illustrating an impact of a change to one or more of the first criteria, the second criteria, and the third criteria on classification of the record pairs. 13. A system as in claim 7 , wherein the computing hardware comprises: at least one programmable processor; and a machine-readable medium storing instructions that, when executed by the at least one programmable processor, cause the at least one programmable processor to provide the operations. 14. A computer-implemented method comprising: applying first similarity criteria to a set of record pairs to identify a set of near match record pairs, the applying comprising: identifying a pair of records in the set of record pairs as a near match record pair when: a similarity between the pair of records exceeds a minimum similarity threshold and when the pair of records is a name, the pair of records contains an email address comprising a domain name that is the same in the pair of records, the only difference between the pair of records is a numeric component, the pair of records match when the pair of records is translated to a same script, and adding, the near match record pair to the set of near match record pairs; applying, after identifying the set of near match record pairs, second similarity criteria to the set of near match record pairs to identify a set of actual match record pairs, the second similarity criteria being more strict than the first similarity criteria; flagging, as near matches, those record pairs of the set of near matches that were not identified as actual matches; applying, after identifying the set of actual match record pairs, third similarity criteria to the set of actual match record pairs to identify and flag one or more suspect matches, the third similarity criteria being more strict than the second similarity criteria; flagging, as a s

Assignees

Inventors

Classifications

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9805072B2 cover?
Qualification of data matches can be improved relative to existing approaches by use of first, second, and third similarity criteria, which can be used to identify a set of near match record pairs, identify a set of actual match record pairs and to flag as near matches those record pairs of the set of near matches that were identified as actual matches, and to identify and flag one or more susp…
Who is the assignee on this patent?
Spiess Mark, Woody Jeffrey, Dupey Ronald, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).