G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)

Slides:



Advertisements
Similar presentations
1 Service Oriented Architectures (SOA): What Users Need to Know. OGF 19: January 31, 2007 Charlotte, NC John Salasin, Ph.D, Visiting Researcher National.
Advertisements

SolidWorks Enterprise PDM Data Loading Strategies
Introduction to OWB(Oracle Warehouse Builder)
Jenga and the art of data-intensive ecosystems maintenance Panos Vassiliadis in collaboration with G. Papastefanatos, P. Manousis, A. Simitsis, Y. Vassiliou.
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
Automating the adaptation of evolving data-intensive ecosystems Petros Manousis, Panos Vassiliadis University of Ioannina, Ioannina, Greece George Papastefanatos.
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice ©2011 Hewlett-Packard Development.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Data Manager Best Practices Business Intelligence Solutions.
Structured Query Language Chapter Three Part 3 – Inserts, Updates, Deletes.
Management of the Evolution of Database-Centric Information Systems Panos Vassiliadis 2, George Papastefanatos 1, Timos Sellis 1, Yannis Vassiliou 1 1.
George Papastefanatos 1, Fotini Anagnostou 1 Panos Vassiliadis 2, Yannis Vassiliou 1 (1) National Technical University of Athens
Integrating Hypermedia Functionality into Database Applications Anirban Bhaumik * +, Deepti Dixit *, Roberto Galnares *, Manolis Tzagarakis **, Michalis.
Graph-Based Modeling of ETL Activities with Multi-Level Transformations and Updates Alkis Simitsis 1, Panos Vassiliadis 2, Manolis Terrovitis 1, Spiros.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens
Shawn McClure Software Engineer CIRA, Colorado State University Projects: Visibility Information Exchange Web.
Macro-level Scheduling of ETL Workflows Anastasios Karagiannis 1, Panos Vassiliadis 1, Alkis Simitsis 2 1 Univ. of Ioannina, Greece 2 HP Labs, USA
Automatic Data Ramon Lawrence University of Manitoba
G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)
G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, K. Aggistalis 2, F. Pechlivani 2, Yannis Vassiliou 1 (1) National Technical University of Athens.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Provenance in ETL Scenarios Panos Vassiliadis University of Ioannina (joint work with Alkis Simitsis, IBM Almaden Research Center, Timos Sellis and.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
ETL Design and Development Michael A. Fudge, Jr.
DartGrid Browser-based mapping tool of SQL to RDF Point Template Zhejiang University & OpenLink Software.
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
Gary MacDougall Premjit Singh Managing your Distributed Data.
H ECATAEUS A Framework for Representing SQL Constructs as Graphs George Papastefanatos 1, Kostis Kyzirakos 1, Panos Vassiliadis 2, Yannis Vassiliou 1 1.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Change Impact Analysis for AspectJ Programs Sai Zhang, Zhongxian Gu, Yu Lin and Jianjun Zhao Shanghai Jiao Tong University.
Accessing to Spatial Data in Mobile Environment Presented By Jekkin Shah.
Dali JPA Tools. About Dali Dali JPA Tools is an Eclipse Web Tools Platform sub-Project Dali 1.0 is a part of WTP 2.0 Europa coordinated release Goal -
E-R Modeler: A Database Modeling Toolkit for Eclipse Hui Wu wuh -at- cis.uab.edu Academic Advisor : Dr. Jeff Gray gray -at-
Recent research : Temporal databases N. L. Sarda
Formalizing the Asynchronous Evolution of Architecture Patterns Workshop on Self-Organizing Software Architectures (SOAR’09) September 14 th 2009 – Cambrige.
K.Stencel. SBQL Views, slide 1 March 2008 SBQL Object Views. Unlimited Mapping and Updatability Presentation prepared for 1 st International Conference.
Dimitrios Skoutas Alkis Simitsis
BIEN Confederated DB (S) Analytical DB(s) Heterogeneous source database(s) of Plots/Specimens/Occurrences Synonymy Names Reference taxonomy *** *** Feedback.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
A Taxonomy of ETL Activities Panos Vassiliadis 1, Alkis Simitsis 2, Eftychia Baikousi 1 (1) University of Ioannina (2) HP Labs.
Conceptual Modeling for ETL processes Panos Vassiliadis, Alkis Simitsis, Spiros Skiadopoulos National Technical.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
SimDB Implementation & Browser IVOA InterOp 2008 Meeting, Theory Session 1. Baltimore, 26/10/2008 Laurent Bourgès This work makes use of EURO-VO software,
1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology.
7 Strategies for Extracting, Transforming, and Loading.
© 2012 Saturn Infotech. All Rights Reserved. Oracle Hyperion Data Relationship Management Presented by: Prasad Bhavsar Saturn Infotech, Inc.
An Adapter-Based Approach to Co-evolve Generated SQL in M2T Transformations Jokin García 1, Oscar Díaz 1 and Jordi Cabot 2 Onekin 1 University of the Basque.
V7 Foundation Series Vignette Education Services.
ArrayExpress Ugis Sarkans EMBL - EBI
Future Directions in Data Warehousing Research DOLAP ’04 Panel Discussion Karen C. Davis Electrical & Computer Engineering and Computer Science Dept. University.
Open Governance Platform
Building Enterprise Applications Using Visual Studio®
MAIME: A Maintenance Manager for ETL Processes
A Model for Data Warehouse Operational Processes
Phil Bernstein Microsoft Corp.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Knowledge Based Workflow Building Architecture
Populating a Data Warehouse
Service Oriented Architectures (SOA): What Users Need to Know.
Chapter 2 Database Environment Pearson Education © 2009.
Database SQL.
Chapter 2 Database Environment Pearson Education © 2009.
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, (2) University of Ioannina, Ioannina, Hellas (Greece) (3) HP Labs, Palo Alto, California, USA (4) Institute for the Management of Information Systems (Greece) Rule-based Management of Schema Changes at ETL sources

MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

Data Warehouse Environment MEDWa ‘09, Riga, September 20094

Data Warehouse Schema Evolution MEDWa ‘09, Riga, September Data warehouses are evolving environments, e.g.:  A dimension is removed or renamed  The structure of a dimension table is updated  A fact table is completely decoupled from a dimension  The measures of a fact table change  An ETL source is modified, etc

Evolving ETL sources… Schema Changes on the sources of ETL processes. Design constructs are –Added, Removed, Modified ETL processes affected: –Syntactically –Syntactically – i.e., become invalid –Semantically –Semantically – i.e., must conform to the new source database semantics Adaptation of ETL flows –time-consuming task, –treated in most of the cases manually by the administrators/developers MEDWa ‘09, Riga, September 20096

We would like to know... What part of the process is affected and how if e.g., an attribute is deleted? Can we predict and handle the impact of changes? To what extent can readjustment be automated? MEDWa ‘09, Riga, September 20097

Hecataeus Framework MEDWa ‘09, Riga, September  Mechanism for performing what-if analysis for potential changes of ETL sources  Graph based representation of ETL workflows  Annotation of graph with rules for adapting ETL processes to source schema evolution  Evolution events are mapped to changes on the graph constructs

MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

ETL Workflow representation MEDWa ‘09, Riga, September

Query representation MEDWa ‘09, Riga, September Q:SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours FROM EMP, WORKS WHERE EMP.Emp# = WORKS.Emp# GROUP BY EMP.Emp# Join, GB

MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

Graph Annotation with rules According to prevailing policy, the proper action is taken  graph evolution MEDWa ‘09, Riga, September

Example MEDWa ‘09, Riga, September Q: SELECT EMP.Emp#, EMP.Name FROM EMP Q: SELECT EMP.Emp#, EMP.Name, Phone FROM EMP Event Add attribute Phone to relation EMP

MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

System architecture MEDWa ‘09, Riga, September DDL files SQL scripts DB Catalog Parser Create DB Schema Evolution Manager Workload representation Evolution Semantics Validate Workload Graph Viewer DB Schema representation XML, jpeg Import/ Export Scenarios Graph Visualization Metric Manager

Evolution Manager Architecture MEDWa ‘09, Riga, September

MEDWa ‘09, Riga, September Outline Motivation Graph-based representation of ETL processes Regulating ETL Evolution Hecataeus Internals Conclusions

Research in DB Evolution DB Schema Evolution –OODB evolution –Schema versioning DW Schema Evolution –Taxonomy of evolution events –Versioning –Materialized Views Evolution –View adaptation & synchronization Evolution wrt Model Mappings MEDWa ‘09, Riga, September

Summarizing The problem of adaptation of ETL workflows to evolvable data sources Graph –based representation of ETL activities Graph enrichment with semantics for evolution events Graph annotation with rules for handling a priori evolution events Hecataeus: Framework for performing and evaluating evolution scenarios in DW environments MEDWa ‘09, Riga, September

Thank you... MEDWa ‘09, Riga, September Hecataeus : A tool for visualizing and performing what-if analysis for evolution scenarios