Analysis of a system for matching data records

US10698755B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10698755-B2
Application numberUS-201414290030-A
CountryUS
Kind codeB2
Filing dateMay 29, 2014
Priority dateSep 28, 2007
Publication dateJun 30, 2020
Grant dateJun 30, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments disclosed herein provide a system and method for analyzing an identity hub. Particularly, a user can connect to the identity hub, load an initial set of data records, create and/or edit an identity hub configuration locally, analyze and/or validate the configuration via a set of analysis tools, including an entity analysis tool, a data analysis tool, a bucket analysis tool, and a linkage analysis tool, and remotely deploy the validated configuration to an identity hub instance. In some embodiments, through a graphical user interface, these analysis tools enable the user to analyze and modify the configuration of the identity hub in real time while the identity hub is operating to ensure data quality and enhance system performance.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for analyzing a system for matching data records, the method comprising: producing a configuration of said system for matching data records, the configuration of the system including a bucketing strategy employing matching functions and matching parameters to create buckets containing data records, wherein said buckets are created by comparing sets of one or more attributes of initial data records with corresponding attributes of candidate data records in said system, wherein each bucket is associated with a corresponding set of attributes, and wherein data records associated with a same entity are determined and linked by comparing one or more attributes of the initial data records to corresponding attributes of the candidate data records within the buckets in accordance with the matching functions and matching parameters; applying said configuration to said system and analyzing buckets created during operation of said system according to the bucketing strategy associated with said configuration of said system; analyzing an effect of said buckets on throughput of said system via a bucket analysis tool providing a user interface, wherein analyzing an effect of said buckets further comprises: executing one or more queries from the user interface of the bucket analysis tool to produce characteristics associated with the buckets created during operation of the system, wherein the characteristics include distribution of data within the created buckets and data records not placed in the created buckets; and identifying performance issues of the system from the characteristics of the created buckets produced from the one or more queries; and modifying said configuration during operation of said system to adjust distribution of the data records within said buckets in real time to address the identified performance issues and enable the throughput of said system to reside within a predetermined desired range, wherein modifying said configuration includes: changing said matching functions and matching parameters of said bucketing strategy for creating said buckets based on said identified performance issues to alter the comparing of said attributes and determination of the association of data records with the same entity for said buckets, wherein changing said matching functions and matching parameters includes providing a different combination of attributes for the corresponding set of attributes for at least one bucket. 2. The method of claim 1 , wherein said changing said matching functions and matching parameters of said bucketing strategy further comprises editing an algorithm utilized in creating said buckets or changing one or more parameter values associated with said algorithm. 3. The method of claim 1 , wherein said modifying said configuration further comprises: estimating performance of said system with said modified configuration under a real time load via the bucket analysis tool to ensure the throughput of said system resides within said predetermined desired range. 4. The method of claim 2 , wherein said algorithm is associated with an entity type, and said method further comprises analyzing entities categorized as having said entity type in said system. 5. The method of claim 4 , wherein said analyzing said entities further comprises one or more from a group of analyzing an entity size distribution, analyzing said entities by size, analyzing said entities by composition, analyzing a score distribution associated with said entities, and analyzing member comparisons associated with said entities. 6. The method of claim 1 , further comprising analyzing validity of attributes of said initial data records. 7. The method of claim 1 , wherein said analyzing said buckets further comprises one or more from a group of analyzing statistics associated with said buckets, analyzing a bucket size distribution, analyzing said buckets by size, analyzing said buckets by composition, analyzing a bulk cross match comparison distribution, analyzing members by bucket count, analyzing member bucket values, analyzing member bucket frequencies, and analyzing a member comparison distribution. 8. The method of claim 1 , further comprising analyzing error rates associated with said initial data records, wherein said error rates comprise a record error rate and a person error rate. 9. The method of claim 1 , wherein said configuration of said system comprises a clerical review threshold and an autolink threshold, and wherein said clerical review threshold and said autolink threshold are indicative of tolerance of said system to false positive and false negative rates in matching said initial data records, further comprising analyzing said clerical review threshold and said autolink threshold. 10. A system for analyzing an identity system for matching data records, the system comprising: at least one processor with logic to: produce a configuration of said identity system for matching data records, the configuration of the identity system including a bucketing strategy employing matching functions and matching parameters to create buckets containing data records, wherein said buckets are created by comparing sets of one or more attributes of initial data records with corresponding attributes of candidate data records in said identity system, wherein each bucket is associated with a corresponding set of attributes, and wherein data records associated with a same entity are determined and linked by comparing one or more attributes of the initial data records to corresponding attributes of the candidate data records within the buckets in accordance with the matching functions and matching parameters; apply said configuration to said identity system and analyze buckets created during operation of said identity system according to the bucketing strategy associated with said configuration of said identity system; analyze an effect of said buckets on throughput of said identity system via a bucket analysis tool providing a user interface, wherein analyzing an effect of said buckets further comprises: executing one or more queries from the user interface of the bucket analysis tool to produce characteristics associated with the buckets created during operation of the identity system, wherein the characteristics include distribution of data within the created buckets and data records not placed in the created buckets; and identifying performance issues of the identity system from the characteristics of the created buckets produced from the one or more queries; and modify said configuration during operation of said identity system to adjust distribution of the data records within said buckets in real time to address the identified performance issues and enable the throughput of said identity system to reside within a predetermined desired range, wherein modifying said configuration includes: changing said matching functions and matching parameters of said bucketing strategy for creating said buckets based on said identified performance issues to alter the comparing of said attributes and determination of the association of data records with the same entity for said buckets, wherein changing said matching functions and matching parameters includes providing a different combination of attributes for the corresponding set of attributes for at least one bucket. 11. The system of claim 10 , wherein said at least one processor further displays an algorithm editor through which an algorithm utilized in creating said buckets is edited. 12. The system of claim 10 , wherein said bucketing strategy is associated with an entity type, and wherein said at least o

Assignees

Inventors

Classifications

  • Approximate or statistical queries · CPC title

  • Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title

  • Readable error formats, e.g. cross-platform generic formats, human understandable formats · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10698755B2 cover?
Embodiments disclosed herein provide a system and method for analyzing an identity hub. Particularly, a user can connect to the identity hub, load an initial set of data records, create and/or edit an identity hub configuration locally, analyze and/or validate the configuration via a set of analysis tools, including an entity analysis tool, a data analysis tool, a bucket analysis tool, and a li…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2462. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 30 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).