Active management of files being processed in enterprise data warehouses utilizing time series predictions
US-2024256573-A1 · Aug 1, 2024 · US
US9542469B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9542469-B2 |
| Application number | US-86836310-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 25, 2010 |
| Priority date | Aug 25, 2010 |
| Publication date | Jan 10, 2017 |
| Grant date | Jan 10, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In the context of data administration in enterprises, an effective manner of providing a central data warehouse, particularly via employing a tool that helps by analyzing existing data and reports from different business units. In accordance with at least one embodiment of the invention, such a tool analyzes the data model of an enterprise and proposes alternatives for building a new data warehouse. The tool, in accordance with at least one embodiment of the invention, models the problem of identifying fact/dimension attributes of a warehouse model as a graph cut on a Dependency Analysis Graph (DAG). The DAG is built using existing data models and the report generation scripts. The tool also uses the DAG for generation of ETL (Extract, Transform Load) scripts that can be used to populate the newly proposed data warehouse from data present in the existing schemas.
Opening claim text (preview).
What is claimed is: 1. A method comprising utilizing at least one processor to execute computer code that performs the steps of: analyzing base table scripts which generate a plurality of reports from preexisting base tables; and developing a schema for a new data warehouse with merged data from the preexisting base tables and the schema configured for singly generating reports relating to the merged data, said developing comprising: forming a fact table in the new data warehouse, wherein the forming a fact table comprises identifying a set of fact attributes, from the plurality of reports generated, wherein the set of fact attributes comprises attributes on which an aggregate operation is defined and attributes referenced in at least one of the plurality of reports generated; forming dimensions in the new data warehouse, wherein the forming dimensions comprises forming a candidate table set by identifying a set of preexisting base tables which include at least one candidate attribute, wherein the at least one candidate attribute comprises an attribute that is contained within a group-by clause within the plurality of reports; and generating warehouse scripts for populating the formed fact table and the formed dimensions for the new data warehouse with data from the preexisting base tables. 2. The method according to claim 1 , further comprising ensuring adherence to predetermined design criteria. 3. The method according to claim 1 , further comprising merging like attributes from different base tables. 4. The method according to claim 1 , wherein said generating comprises referring to base tables. 5. The method according to claim 1 , wherein the forming a fact table comprises scanning a report-generating query to identify attributes on which the aggregate operation is defined. 6. The method according to claim 5 , wherein said scanning comprises identifying a direct projection attribute and an indirect projection attribute. 7. The method according to claim 5 , wherein said scanning comprises employing a dependency analysis graph which represents the report-generating query. 8. The method according to claim 1 , wherein the forming a fact table comprises ascertaining a need for multiple fact tables in the new data warehouse. 9. The method according to claim 1 , wherein the forming dimensions comprises identifying a set of all attributes which are used in a group-by clause of a report-generating query. 10. The method according to claim 1 , wherein the forming dimensions comprises determining candidate dimension tables and ascertaining a potential hierarchy among the candidate dimension tables. 11. The method according to claim 1 , wherein said generating comprises determining a granularity for the fact table in the new data warehouse. 12. An apparatus comprising: one or more hardware processors; and a computer readable storage medium having computer readable program code embodied therewith and executable by the one or more hardware processors, the computer readable program code comprising: computer readable program code configured to analyze scripts which generate a plurality of reports from preexisting base tables; and computer readable program code configured to develop a schema for a new data warehouse with merged data from the preexisting base tables and the schema configured for singly generating reports relating to the merged data, via: forming a fact table in the new data warehouse, wherein the forming a fact table comprises identifying a set of fact attributes, from the plurality of reports generated, wherein the set of fact attributes comprises attributes on which an aggregate operation is defined and attributes referenced in at least one of the plurality of reports generated; forming dimensions in the new data warehouse, wherein the forming dimensions comprises forming a candidate table set by identifying a set of preexisting base tables which include at least one candidate attribute, wherein the at least one candidate attribute comprises an attribute that is contained within a group-by clause within the plurality of reports; and generating scripts for populating the formed fact table and the formed dimensions for the new data warehouse with data from the preexisting base tables. 13. A computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to analyze scripts which generate a plurality of reports from preexisting base tables; and computer readable program code configured to develop a schema for a new data warehouse with merged data from the preexisting base tables and the schema configured for singly generating reports relating to the merged data, via: forming a fact table in the new data warehouse, wherein the forming a fact table comprises identifying a set of fact attributes, from the plurality of reports generated, wherein the set of fact attributes comprises attributes on which an aggregate operation is defined and attributes referenced in at least one of the plurality of reports generated; forming dimensions in the new data warehouse, wherein the forming dimensions comprises forming a candidate table set by identifying a set of preexisting base tables which include at least one candidate attribute, wherein the at least one candidate attribute comprises an attribute that is contained within a group-by clause within the plurality of reports; and generating scripts for populating the formed fact table and the formed dimensions for the new data warehouse with data from the preexisting base tables. 14. The computer program product according to claim 13 , wherein said computer readable program code is further configured to ensure adherence to predetermined design criteria. 15. The computer program product according to claim 13 , wherein said computer readable program code is further configured to merge like attributes from different base tables. 16. The computer program product according to claim 13 , wherein said computer readable program code is configured to refer to base tables in generating scripts for populating the new data warehouse. 17. The computer program product according to claim 13 , wherein said computer readable program code is configured to scan a report-generating query to identify attributes on which the aggregate operation is defined. 18. The computer program product according to claim 17 , wherein said computer readable program code is configured to identify a direct projection attribute and an indirect projection attribute. 19. The computer program product according to claim 17 , wherein said computer readable program code is configured to employ a dependency analysis graph which represents the report-generating query. 20. The computer program product according to claim 13 , wherein said computer readable program code is configured to ascertain a need for multiple fact tables in the new data warehouse. 21. The computer program product according to claim 13 , wherein said computer readable program code is configured to identify a set of all attributes which are used in a group-by clause of a report-generating query. 22. The computer program product according to claim 13 , wherein said computer readable program code is configured to determine candidate dimension tables and ascertain a potential hierarchy among the candidate dimension tables.
Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.