Slowly changing dimension attributes in extract, transform, load processes

US9311368B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9311368-B2
Application numberUS-201213618158-A
CountryUS
Kind codeB2
Filing dateSep 14, 2012
Priority dateNov 10, 2011
Publication dateApr 12, 2016
Grant dateApr 12, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method, computer program product and a system for identifying and handling slowly changing dimension (SCD) attributes for use with an Extract, Transform, Load (ETL) process, comprising importing a data model for dimensional data into a data integration system, where the dimensional data comprises a plurality of attributes, identifying via a data discovery analyzer one or more attributes in the data model as SCD attributes, importing the identified SCD attributes into the data integration system, selecting a data source comprising dimensional data, automatically generating an ETL job for the dimensional data utilizing the imported SCD attributes, and executing the automatically generated ETL to extract the dimensional data from the data source and loading the dimensional data into the imported SCD attributes in a target data system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of identifying and handling slowly changing dimension (SCD) attributes for use with an Extract, Transform, Load (ETL) process, comprising: importing a data model for dimensional data into a data integration system, wherein the dimensional data comprises a plurality of attributes for one or more dimensions; analyzing the data model and identifying, via a data discovery analyzer, one or more attributes in the data model as SCD attributes, wherein the identifying includes identifying at least one attribute in the data model as an SCD attribute based on a comparison of a name of the at least one attribute to a stored set of names used for SCD attributes; importing the identified SCD attributes into the data integration system; selecting a data source comprising dimensional data; automatically generating an ETL job for the dimensional data utilizing the imported SCD attributes; and executing the automatically generated ETL job to extract the dimensional data from the data source and loading the dimensional data into the imported SCD attributes in a target data system. 2. The method of claim 1 , wherein each SCD attribute is associated with a set of conditions for values of that SCD attribute, and the method further comprising: verifying the identified SCD attributes in the data model against the associated sets of conditions by comparing values for each identified SCD attribute from data present in one or more physical data tables in a data source to at least one of each other and a reference value to determine compliance of that identified SCD attribute with the corresponding set of conditions. 3. The method of claim 1 , wherein the one or more SCD attributes are selected from the group consisting of surrogate key, version, start date, end date, current, original, SCD type, and calendar time. 4. The method of claim 1 , wherein the identifying one or more attributes in the data model as SCD attributes further comprises: determining a data type of an attribute in the dimensional data; and analyzing the data type to identify an SCD attribute. 5. The method of claim 1 , wherein the identifying one or more attributes in the data model as SCD attributes further comprises: determining a similarity score measuring the similarity of an attribute name for an attribute in the plurality of attributes with a stored attribute name; if the similarity score is above a predetermined threshold score, identifying the attribute as an SCD attribute. 6. The method of claim 1 , wherein an SCD type for a dimension contains a corresponding set of SCD attributes, and the method further comprising: identifying the SCD type for each dimension in the data model with identified SCD attributes by comparing the identified SCD attributes in the data model for that dimension with the corresponding sets of SCD attributes associated with the SCD types. 7. The method of claim 1 , wherein the identifying one or more attributes in the data model as SCD attributes further comprises: comparing a first attribute name for a first attribute in the plurality of attributes with a second attribute name for a second attribute in the plurality of attributes to determine if the first and second attribute names comprise common tokens; and if the first and second attribute names comprise common tokens, identifying the first and second attributes as SCD attributes if one of the first and second attribute names comprises an active attribute modifier and the other of the first and second attribute names comprises a passive attribute modifier. 8. A computer program product for identifying and handling slowly changing dimension (SCD) attributes for use with an Extract, Transform, Load (ETL) process, comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to: import a data model for dimensional data into a data integration system, wherein the dimensional data comprises a plurality of attributes for one or more dimensions; analyze the data model and identify, via a data discovery analyzer, one or more attributes in the data model as SCD attributes, wherein the identifying includes identifying at least one attribute in the data model as an SCD attribute based on a comparison of a name of the at least one attribute to a stored set of names used for SCD attributes; import the identified SCD attributes into the data integration system; select a data source comprising dimensional data; automatically generate an ETL job for the dimensional data utilizing the imported SCD attributes; and execute the automatically generated ETL job to extract the dimensional data from the data source and load the dimensional data into the imported SCD attributes in a target data system. 9. The computer program product of claim 8 , wherein each SCD attribute is associated with a set of conditions for values of that SCD attribute, and the computer readable program code is further configured to: verify the identified SCD attributes in the data model against the associated sets of conditions by comparing values for each identified SCD attribute from data present in one or more physical data tables in a data source to at least one of each other and a reference value to determine compliance of that identified SCD attribute with the corresponding set of conditions. 10. The computer program product of claim 8 , wherein the one or more SCD attributes are selected from the group consisting of surrogate key, version, start date, end date, current, original, SCD type, and calendar time. 11. The computer program product of claim 8 , wherein the identifying one or more attributes in the data model as SCD attributes comprises the computer readable program code being further configured to: determine a data type of an attribute in the dimensional data; and analyze the data type to identify an SCD attribute. 12. The computer program product of claim 8 , wherein the identifying one or more attributes in the data model as SCD attributes comprises the computer readable program code being further configured to: determine a similarity score measuring the similarity of an attribute name for an attribute in the plurality of attributes with a stored attribute name; if the similarity score is above a predetermined threshold score, identify the attribute as an SCD attribute. 13. The computer program product of claim 8 , wherein an SCD type for a dimension contains a corresponding set of SCD attributes, and the computer readable program code being further configured to: identify the SCD type for each dimension in the data model with identified SCD attributes by comparing the identified SCD attributes in the data model for that dimension with the corresponding sets of SCD attributes associated with the SCD types. 14. The computer program product of claim 8 , wherein the identifying one or more attributes in the data model as SCD attributes comprises the computer readable program code being further configured to: compare a first attribute name for a first attribute in the plurality of attributes with a second attribute name for a second attribute in the plurality of attributes to determine if the first and second attribute names comprise common tokens; and if the first and second attribute names comprise common tokens, identify the first and second attributes as SCD attributes if one of the first and second attribute names comprises an active attribute modifier and the other of the first and second attribute names comprises a pass

Assignees

Inventors

Classifications

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9311368B2 cover?
A computer-implemented method, computer program product and a system for identifying and handling slowly changing dimension (SCD) attributes for use with an Extract, Transform, Load (ETL) process, comprising importing a data model for dimensional data into a data integration system, where the dimensional data comprises a plurality of attributes, identifying via a data discovery analyzer one or …
Who is the assignee on this patent?
Bhide Manish A, Mittapalli Srinivas Kiran, Padmanabhan Sriram, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).