Automatic generation of instantiation rules to determine quality of data migration

US10013439B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10013439-B2
Application numberUS-201113169211-A
CountryUS
Kind codeB2
Filing dateJun 27, 2011
Priority dateJun 27, 2011
Publication dateJul 3, 2018
Grant dateJul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

During migration of data from at least one data source to a target system, data quality is determined by obtaining metadata associated with the target system, automatically generating instantiated rules for assessing a quality of data to be loaded from the at least one data source into the target system, where the instantiated rules are dependent upon the obtained metadata associated with the target system, and applying a quality analysis based upon the instantiated rules to the data to be loaded into the target system. The quality analysis provides an indication of a level of compliance of the data with requirements of the target system.

First claim

Opening claim text (preview).

What is claimed: 1. A computer-implemented method of determining data quality during migration of data from at least one data source to a target system, the method comprising: obtaining metadata associated with the target system; automatically generating instantiated rules specific to the target system for assessing a quality of data to be loaded from the at least one data source into the target system by applying the obtained metadata to one or more rule templates, wherein the instantiated rules are dependent upon the obtained metadata and vary between different target systems; performing a quality analysis by applying the instantiated rules to the data to be loaded into the target system and providing an indication of a level of compliance of the data with requirements of the target system, wherein the performing a quality analysis further comprises: discovering, from the metadata, whether each field within each table is supported by a respective reference data table, and for each respective reference data table supporting a field, performing: generating a respective extraction job to extract the respective reference data table from configuration data, validating, against the extracted respective reference data table, each respective field within each respective table supported by the respective reference data table, and when a value of a respective field supported by the respective reference data table does not exist in the extracted respective reference data table, identifying a data validity gap; and providing a visualization of a level of compliance of the data in relation to requirements of the target system resulting from the performance of the quality analysis on the data utilizing the instantiated rules, wherein the visualization comprises a plurality of gap reports, each gap report providing an indication of one of a plurality of specific gap types of the data in relation to the requirements of the target system so as to identify quality issues and enhance cleaning or correction of the data during the migration. 2. The method of claim 1 , wherein the instantiated rules require a check of specific fields and columns in data structures to verify whether the data within the specific fields and columns complies with the instantiated rules. 3. The method of claim 1 , wherein the instantiated rules are applied to data within a migration database prior to loading the data into the target system. 4. The method of claim 1 , wherein the respective reference data table comprises information about values that are valid for the respective field supported by the respective reference data table. 5. The method of claim 1 , wherein each instantiated rule is automatically generated by combining metadata associated with the target system with a pre-defined rule template. 6. The method of claim 1 , wherein the plurality of specific gap types comprise at least two of a Data Completeness Gap Report (DCGR), a Data Validity Gap Report (DVGR), a Field Length Gap Report (FLGR), a Category Completeness Gap Report (CCGR), a Relationship Orphan Gap Report (ROGR), a Record Relationship Gap Report (RRGR) and a Data Type Gap Report (DTGR). 7. A system for assessing a quality of data during migration of the data from at least one data source to a target system, wherein after the assessing and possible data cleansing, the data is loaded into the target system, the system comprising: at least one processor configured with logic to: obtain metadata associated with the target system and automatically generate instantiated rules specific to the target system for assessing a quality of data while in transit to the target system by applying the obtained metadata to one or more rule templates, wherein the instantiated rules are dependent upon the obtained metadata and vary between different target systems; perform a quality analysis by applying the instantiated rules to the data while in transit to the target system and provide an indication of a level of compliance of the data with requirements of the target system, wherein the logic to perform a quality analysis further comprises logic for the at least one processor to be configured to: discover, from the metadata, whether each field within each table is supported by a respective reference data table, and for each respective reference data table supporting a field, perform: generate a respective extraction job to extract the respective reference data table from configuration data, validate, against the extracted respective reference data table, each respective field within each respective table supported by the respective reference data table, and when a value of a respective field supported by the respective reference data table does not exist in the extracted respective reference data table, identify a data validity gap; and provide a visualization of a level of compliance of the data in relation to requirements of the target system resulting from the performance of the quality analysis on the data utilizing the instantiated rules, wherein the visualization comprises a plurality of gap reports, each gap report providing an indication of one of a plurality of specific gap types of the data in relation to the requirements of the target system so as to identify quality issues and enhance cleaning or correction of the data during the migration. 8. The system of claim 7 , wherein the at least one processor is further configured to utilize the instantiated rules so as to require a check of specific fields and columns in data tables to verify whether the data within the specific fields and columns complies with the instantiated rules. 9. The system of claim 7 , wherein the respective reference data table comprises information about values that are valid for the respective field supported by the respective reference data table. 10. The system of claim 7 , wherein each instantiated rule is automatically generated by combining metadata associated with the target system with a pre-defined rule template. 11. The system of claim 7 , wherein the plurality of specific gap types comprise at least two of a Data Completeness Gap Report (DCGR), a Data Validity Gap Report (DVGR), a Field Length Gap Report (FLGR), a Category Completeness Gap Report (CCGR), a Relationship Orphan Gap Report (ROGR), a Record Relationship Gap Report (RRGR) and a Data Type Gap Report (DTGR). 12. A computer program product for determining data quality during migration of data from at least one data source to a target system, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code configured to: obtain metadata associated with the target system; automatically generate instantiated rules specific to the target system for assessing a quality of data to be loaded from the at least one data source into the target system by applying the obtained metadata to one or more rule templates, wherein the instantiated rules are dependent upon the obtained metadata and vary between different target systems; perform a quality analysis by applying the instantiated rules to the data to be loaded into the target system and provide an indication of a level of compliance of the data with requirements of the target system, wherein the computer readable program code being configured to perform a quality analysis further comprises the computer readable program code being configured to: discover, from the metadata, whether each field within each table is supported by a respective reference data table, and for each respective reference data table supporting a field, perform: generate a respect

Assignees

Inventors

Classifications

  • Database migration support · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013439B2 cover?
During migration of data from at least one data source to a target system, data quality is determined by obtaining metadata associated with the target system, automatically generating instantiated rules for assessing a quality of data to be loaded from the at least one data source into the target system, where the instantiated rules are dependent upon the obtained metadata associated with the t…
Who is the assignee on this patent?
Gruenheid Anja, Maier Albert, Oberhofer Martin, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).