Presentation is loading. Please wait.

Presentation is loading. Please wait.

G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)

Similar presentations


Presentation on theme: "G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)"— Presentation transcript:

1 G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, yv}@dblab.ece.ntua.gr (2) University of Ioannina, Ioannina, Hellas (Greece) pvassil@cs.uoi.gr (3) HP Labs, Palo Alto, California, USA alkis@hp.com (4) Institute for the Management of Information Systems (Greece) timos@imis.athena-innovation.gr Rule-based Management of Schema Changes at ETL sources

2 MEDWa ‘09, Riga, September 20092 Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

3 MEDWa ‘09, Riga, September 20093 Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

4 Data Warehouse Environment MEDWa ‘09, Riga, September 20094

5 Data Warehouse Schema Evolution MEDWa ‘09, Riga, September 20095 Data warehouses are evolving environments, e.g.:  A dimension is removed or renamed  The structure of a dimension table is updated  A fact table is completely decoupled from a dimension  The measures of a fact table change  An ETL source is modified, etc

6 Evolving ETL sources… Schema Changes on the sources of ETL processes. Design constructs are –Added, Removed, Modified ETL processes affected: –Syntactically –Syntactically – i.e., become invalid –Semantically –Semantically – i.e., must conform to the new source database semantics Adaptation of ETL flows –time-consuming task, –treated in most of the cases manually by the administrators/developers MEDWa ‘09, Riga, September 20096

7 We would like to know... What part of the process is affected and how if e.g., an attribute is deleted? Can we predict and handle the impact of changes? To what extent can readjustment be automated? MEDWa ‘09, Riga, September 20097

8 Hecataeus Framework MEDWa ‘09, Riga, September 20098  Mechanism for performing what-if analysis for potential changes of ETL sources  Graph based representation of ETL workflows  Annotation of graph with rules for adapting ETL processes to source schema evolution  Evolution events are mapped to changes on the graph constructs

9 MEDWa ‘09, Riga, September 20099 Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

10 ETL Workflow representation MEDWa ‘09, Riga, September 200910

11 Query representation MEDWa ‘09, Riga, September 200911 Q:SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours FROM EMP, WORKS WHERE EMP.Emp# = WORKS.Emp# GROUP BY EMP.Emp# Join, GB

12 MEDWa ‘09, Riga, September 200912 Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

13 Graph Annotation with rules According to prevailing policy, the proper action is taken  graph evolution MEDWa ‘09, Riga, September 200913

14 Example MEDWa ‘09, Riga, September 200914 Q: SELECT EMP.Emp#, EMP.Name FROM EMP Q: SELECT EMP.Emp#, EMP.Name, Phone FROM EMP Event Add attribute Phone to relation EMP

15 MEDWa ‘09, Riga, September 200915 Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

16 System architecture MEDWa ‘09, Riga, September 200916 DDL files SQL scripts DB Catalog Parser Create DB Schema Evolution Manager Workload representation Evolution Semantics Validate Workload Graph Viewer DB Schema representation XML, jpeg Import/ Export Scenarios Graph Visualization Metric Manager

17 Evolution Manager Architecture MEDWa ‘09, Riga, September 200917

18 MEDWa ‘09, Riga, September 200918 Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

19 Research in DB Evolution DB Schema Evolution –OODB evolution –Schema versioning DW Schema Evolution –Taxonomy of evolution events –Versioning –Materialized Views Evolution –View adaptation & synchronization Evolution wrt Model Mappings MEDWa ‘09, Riga, September 200919

20 Summarizing The problem of adaptation of ETL workflows to evolvable data sources Graph –based representation of ETL activities Graph enrichment with semantics for evolution events Graph annotation with rules for handling a priori evolution events Hecataeus: Framework for performing and evaluating evolution scenarios in DW environments MEDWa ‘09, Riga, September 200920

21 Thank you... MEDWa ‘09, Riga, September 200921 http://www.cs.uoi.gr/~pvassil/projects/hecataeus/ Hecataeus : A tool for visualizing and performing what-if analysis for evolution scenarios


Download ppt "G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)"

Similar presentations


Ads by Google