G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, (2) University of Ioannina, Ioannina, Hellas (Greece) (3) HP Labs, Palo Alto, California, USA (4) Institute for the Management of Information Systems (Greece) Rule-based Management of Schema Changes at ETL sources
MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions
MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions
Data Warehouse Environment MEDWa ‘09, Riga, September 20094
Data Warehouse Schema Evolution MEDWa ‘09, Riga, September Data warehouses are evolving environments, e.g.: A dimension is removed or renamed The structure of a dimension table is updated A fact table is completely decoupled from a dimension The measures of a fact table change An ETL source is modified, etc
Evolving ETL sources… Schema Changes on the sources of ETL processes. Design constructs are –Added, Removed, Modified ETL processes affected: –Syntactically –Syntactically – i.e., become invalid –Semantically –Semantically – i.e., must conform to the new source database semantics Adaptation of ETL flows –time-consuming task, –treated in most of the cases manually by the administrators/developers MEDWa ‘09, Riga, September 20096
We would like to know... What part of the process is affected and how if e.g., an attribute is deleted? Can we predict and handle the impact of changes? To what extent can readjustment be automated? MEDWa ‘09, Riga, September 20097
Hecataeus Framework MEDWa ‘09, Riga, September Mechanism for performing what-if analysis for potential changes of ETL sources Graph based representation of ETL workflows Annotation of graph with rules for adapting ETL processes to source schema evolution Evolution events are mapped to changes on the graph constructs
MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions
ETL Workflow representation MEDWa ‘09, Riga, September
Query representation MEDWa ‘09, Riga, September Q:SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours FROM EMP, WORKS WHERE EMP.Emp# = WORKS.Emp# GROUP BY EMP.Emp# Join, GB
MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions
Graph Annotation with rules According to prevailing policy, the proper action is taken graph evolution MEDWa ‘09, Riga, September
Example MEDWa ‘09, Riga, September Q: SELECT EMP.Emp#, EMP.Name FROM EMP Q: SELECT EMP.Emp#, EMP.Name, Phone FROM EMP Event Add attribute Phone to relation EMP
MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions
System architecture MEDWa ‘09, Riga, September DDL files SQL scripts DB Catalog Parser Create DB Schema Evolution Manager Workload representation Evolution Semantics Validate Workload Graph Viewer DB Schema representation XML, jpeg Import/ Export Scenarios Graph Visualization Metric Manager
Evolution Manager Architecture MEDWa ‘09, Riga, September
MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions
Research in DB Evolution DB Schema Evolution –OODB evolution –Schema versioning DW Schema Evolution –Taxonomy of evolution events –Versioning –Materialized Views Evolution –View adaptation & synchronization Evolution wrt Model Mappings MEDWa ‘09, Riga, September
Summarizing The problem of adaptation of ETL workflows to evolvable data sources Graph –based representation of ETL activities Graph enrichment with semantics for evolution events Graph annotation with rules for handling a priori evolution events Hecataeus: Framework for performing and evaluating evolution scenarios in DW environments MEDWa ‘09, Riga, September
Thank you... MEDWa ‘09, Riga, September Hecataeus : A tool for visualizing and performing what-if analysis for evolution scenarios