Active management of files being processed in enterprise data warehouses utilizing time series predictions
US-2024256573-A1 · Aug 1, 2024 · US
US9870418B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9870418-B2 |
| Application number | US-98304410-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 31, 2010 |
| Priority date | Dec 31, 2010 |
| Publication date | Jan 16, 2018 |
| Grant date | Jan 16, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an embodiment of the invention, a method for data profiling incorporating an enterprise service bus (ESB) coupling the target and source systems following an extraction, transformation, and loading (ETL) process for a target system and a source system is provided. The method includes receiving baseline data profiling results obtained during ETL from a source application to a target application, caching the updates, determining current data profiling results within the ESB for cached updates, and triggering an action if a threshold disparity is detected upon the current data profiling results and the baseline data profiling results.
Opening claim text (preview).
We claim: 1. A computer program product for data profiling, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code for executing an Extract Transfer Load (ETL) process in which data is first moved from a source database of a source application to a target database of a target application during which movement the data is extracted from the source database into a persistency comprising a staging area, an alignment area and a preload area, then transformed to a model that is common to both the source and target databases, and finally cleansed and loaded into the target database so as to initially populate a data warehouse; computer readable program code for performing baseline data profiling on the extracted data in the persistency during ETL in order to produce baseline data profiling results; and, computer readable program code for, subsequent to the ETL, receiving in an enterprise service bus (ESB) updates to the source database and placing the updates in a cache memory on the ESB, determining whether multi-record profiling or only single record profiling has been selected for profiling cached updates, and on condition that multi-record profiling is selected, performing data profiling on the updates in the cache on the ESB and determining current data profiling results for the cached updates and, comparing the current data profiling results for the cached updates to the baseline data profiling results, but otherwise performing single record profiling on the updates without comparing the current data profiling results for the cached updates to the baseline data profiling results, and triggering an action if a threshold disparity is detected based upon the current data profiling results. 2. The computer program product of claim 1 , wherein the the action is a data governance action. 3. The computer program product of claim 2 , wherein the data governance action is integrated with at least one data governance application. 4. The computer program product of claim 2 , wherein the action is notifying a data steward. 5. An in-memory cache profiler system comprising: a computer with at least one processor and memory; at least one source application executing on the computer; an ESB coupled to the at least one source application and a target application; at least one connector coupled to the source application and the ESB and at least one connector coupled to the ESB and the target application; at least one cache in the ESB corresponding to each source application; a persistency coupled to the ESB; and, an in-memory cache profiler module coupled to the ESB, the module comprising program code enabled, subsequent to (ETL) processing of the at least one source application in which data is first moved from a source database of a source application to a target database of a target application during which movement the data is extracted from the source database into a persistency comprising a staging area, an alignment area and a preload area, then transformed to a model that is common to both the source and target databases, and finally cleansed and loaded into the target database so as to initially populate a data warehouse: to receive baseline data profiling results obtained during ETL, to receive in the ESB updates to the source database of the source application and to place the updates in the cache, to determine whether multi-record profiling or only single record profiling has been selected for profiling cached updates, and on condition that multi-record profiling is selected, to perform data profiling on the updates in the cache on the ESB and determine current data profiling results for the cached updates and, compare the current data profiling results for the cached updates to the baseline data profiling results, but otherwise to perform single record profiling on the updates without comparing the current data profiling results for the cached updates to the baseline data profiling results, and to trigger an action if a threshold disparity is detected based upon the current data profiling results. 6. The system of claim 5 , further comprising: at least one data governance application which allows a data steward to review current profiling results with the baseline profile result with the ability, if required, to update the baseline profile results with the current profiling results forming a new baseline profile. 7. The system of claim 5 , wherein the persistency is one persistency selected from the group consisting of a database and a flat file for the baseline data profiling results. 8. The system of claim 5 , wherein the size of at least one cache is determined by defining the total size of the cache and splitting the total assigned memory cache autonomically into chunks for a corresponding table. 9. The system of claim 5 , wherein the program code of the in-memory cache profiler module is further enabled to trigger data governance actions. 10. The system of claim 9 , wherein the program code of the in-memory cache profiler enabled to trigger the data governance actions is further enabled to trigger actions integrated with at least one data governance application. 11. The system of claim 9 , wherein the program code of the in-memory cache profiler enabled to trigger the data governance actions is further enabled to notify a data steward.
Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.