Data cleansing and governance using prioritization schema

US9836488B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9836488-B2
Application numberUS-201414553303-A
CountryUS
Kind codeB2
Filing dateNov 25, 2014
Priority dateNov 25, 2014
Publication dateDec 5, 2017
Grant dateDec 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to an embodiment of the present invention, a computer-implemented method of cleansing data is provided that comprises determining a criticality score and a complexity score for identified attributes of an enterprise, wherein the criticality score represents a relevance of an attribute to one or more enterprise dimensions and the complexity score represents complexity of cleansing data for an attribute. The identified attributes for data cleansing based on the criticality and complexity scores are prioritized, and data of the identified attributes is cleansed in accordance with priority of the identified attributes. Embodiments further include a system, apparatus and computer readable media to cleanse data in substantially the same manner as described above.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of cleansing and migrating data comprising: determining a criticality score and a complexity score for identified attributes of an enterprise, wherein the criticality score represents a relevance of an attribute to one or more enterprise dimensions and is based on criticality factors including usage of the attribute across the enterprise and one or more from a group of: relevance of the attribute to a strategic initiative, relevance of the attribute to a key performance indicator, relevance of the attribute to a current or planned initiative, and a data domain of the attribute, and wherein the complexity score represents complexity of cleansing data for an attribute and is based on complexity factors including one or more from a group of: a business impact of the attribute, a data volume, and a type of cleansing strategy; classifying the identified attributes into categories with different priorities based on the criticality and complexity scores to prioritize the identified attributes for data cleansing; selecting a category of the identified attributes for cleansing based on priorities of the categories; cleansing data of the identified attributes of the selected category, wherein attributes of remaining categories are uncleansed; and migrating data of the enterprise including the cleansed and uncleansed attributes from a source system to a target system. 2. The computer-implemented method of claim 1 , wherein determining a criticality score and complexity score for identified attributes includes: assigning a weight to each of the criticality factors and the complexity factors for the identified attributes; and aggregating the weights of the criticality factors and complexity factors for each identified attribute to produce the criticality and complexity scores for the identified attributes. 3. The computer-implemented method of claim 1 , wherein classifying the identified attributes includes: graphically visualizing the identified attributes with respect to the criticality and complexity scores to prioritize the identified attributes for data cleansing. 4. The computer-implemented method of claim 3 , further comprising graphically visualizing attribute density of a region of a graph by clustering attributes into a plurality of groups, and displaying an indicator proportional in size to a number of attributes in each group. 5. The computer-implemented method of claim 1 , wherein identified attributes including greater criticality scores and lesser complexity scores have greater priority for data cleansing. 6. A system for cleansing and migrating data comprising: at least one processor configured to: determine a criticality score and a complexity score for identified attributes of an enterprise, wherein the criticality score represents a relevance of an attribute to one or more enterprise dimensions and is based on criticality factors including usage of the attribute across the enterprise and one or more from a group of: relevance of the attribute to a strategic initiative, relevance of the attribute to a key performance indicator, relevance of the attribute to a current or planned initiative, and a data domain of the attribute, and wherein the complexity score represents complexity of cleansing data for an attribute and is based on complexity factors including one or more from a group of: a business impact of the attribute, a data volume, and a type of cleansing strategy; classify the identified attributes into categories with different priorities based on the criticality and complexity scores to prioritize the identified attributes for data cleansing; select a category of the identified attributes for cleansing based on priorities of the categories; cleanse data of the identified attributes of the selected category, wherein attributes of remaining categories are uncleansed; and migrate data of the enterprise including the cleansed and uncleansed attributes from a source system to a target system. 7. The system of claim 6 , wherein determining a criticality score and complexity score for identified attributes includes: assigning a weight to each of the criticality factors and the complexity factors for the identified attributes; and aggregating the weights of the criticality factors and complexity factors for each identified attribute to produce the criticality and complexity scores for the identified attributes. 8. The system of claim 6 , wherein classifying the identified attributes includes: displaying on a display screen the identified attributes with respect to the criticality and complexity scores to prioritize the identified attributes for data cleansing. 9. The system of claim 8 , wherein the at least one processor is further configured to display on a display screen attribute density of a region of a graph by clustering attributes into a plurality of groups, and displaying an indicator proportional in size to a number of attributes in each group. 10. The system of claim 6 , wherein identified attributes including greater criticality scores and lesser complexity scores have greater priority for data cleansing. 11. A computer program product for cleansing and migrating data comprising a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code, when executed by a processor, causes the processor to: determine a criticality score and a complexity score for identified attributes of an enterprise, wherein the criticality score represents a relevance of an attribute to one or more enterprise dimensions and is based on criticality factors including usage of the attribute across the enterprise and one or more from a group of: relevance of the attribute to a strategic initiative, relevance of the attribute to a key performance indicator, relevance of the attribute to a current or planned initiative, and a data domain of the attribute, and wherein the complexity score represents complexity of cleansing data for an attribute and is based on complexity factors including one or more from a group of: a business impact of the attribute, a data volume, and a type of cleansing strategy; classify the identified attributes into categories with different priorities based on the criticality and complexity scores to prioritize the identified attributes for data cleansing; select a category of the identified attributes for cleansing based on priorities of the categories; cleanse data of the identified attributes of the selected category, wherein attributes of remaining categories are uncleansed; and migrate data of the enterprise including the cleansed and uncleansed attributes from a source system to a target system. 12. The computer program product of claim 11 , wherein determining a criticality score and complexity score for identified attributes includes: assigning a weight to each of the criticality factors and the complexity factors for the identified attributes; and aggregating the weights of the criticality factors and complexity factors for each identified attribute to produce the criticality and complexity scores for the identified attributes. 13. The computer program product of claim 11 , wherein classifying the identified attributes includes: displaying on a display screen the identified attributes with respect to the criticality and complexity scores to prioritize the identified attributes for data cleansing; and displaying on a display screen attribute density of a region of a graph by clustering attributes into a plurality of groups, and displaying an indicator proportional in size to a number of attributes in

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9836488B2 cover?
According to an embodiment of the present invention, a computer-implemented method of cleansing data is provided that comprises determining a criticality score and a complexity score for identified attributes of an enterprise, wherein the criticality score represents a relevance of an attribute to one or more enterprise dimensions and the complexity score represents complexity of cleansing data…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30303. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).