Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college,

Slides:



Advertisements
Similar presentations
Relational Database and Data Modeling
Advertisements

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Views-basics 1. 2 Introduction a view is a perspective of the database different users may need to see the database differently; this is achieved through.
DIMNet Workshop 7 & 8/10/2002 AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College)
RDFTL: An Event-Condition- Action Language for RDF George Papamarkos Alexandra Poulovassilis Peter T. Wood School of Computer Science and Information Systems.
19 January 2007 Data Quality Meeting Alex Poulovassilis.
Intelligent Technologies Module: Ontologies and their use in Information Systems Part II Alex Poulovassilis November/December 2009.
October 2007 Data integration architectures and methodologies for the Life Sciences Alexandra Poulovassilis, Birkbeck, U. of London.
SeLeNe Kick-off Meeting 15-16/11/2002 SeLeNe-related Research At Birkbeck Alex Poulovassilis and Peter T.Wood Database and Web Technologies Group School.
Maurice Hendrix, A3H AH2008, 29/07/2008 A meta level for LAG Adaptation Language.
22-Sep-06 CS6795 Semantic Web Techniques 0 Extensible Markup Language.
Profiles Construction Eclipse ECESIS Project Construction of Complex UML Profiles UPM ETSI Telecomunicación Ciudad Universitaria s/n Madrid 28040,
Service Access Management Tool Tour: Contract Number
©2011 Quest Software, Inc. All rights reserved. Steve Walch, Senior Product Manager Blog: November, 2011 Partner Training Webcast.
Chapter Information Systems Database Management.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
DiGIR1 DiGIR Distributed Generic Information Retrieval Stan Blum, Dave Vieglais, P.J. Schwartz.
TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
TU/e eindhoven university of technology PACIS'03 July Engineering Semantic Web Information Systems Richard Vdovjak Flavius Frasincar Geert-Jan Houben.
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 10: DATA WAREHOUSING & CACHING PRINCIPLES OF DATA INTEGRATION.
Data Warehouse Overview (Financial Analysis) May 02, 2002.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Chapter 13 The Data Warehouse
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
Department of Software and Computing Systems Physical Modeling of Data Warehouses using UML Sergio Luján-Mora Juan Trujillo DOLAP 2004.
Heterogeneous Data Warehouse Analysis and Dimensional Integration Marius Octavian Olaru XXVI Cycle Computer Engineering and Science Advisor: Prof. Maurizio.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Wrap up  Matching  Geometry  Semantics  Multiscale modelling / incremental update / generalization  Geometric algorithms  Web Services.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens
Data Warehouse success depends on metadata
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
XCube XML For Data Warehouses By Sven Groot. Data warehouses Contains data drawn from several databases and external sources Contains data drawn from.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Dimitrios Skoutas Alkis Simitsis
Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Aberdeen, 28/1/2003 AutoMed: Automatic generation of Mediator tools for heterogeneous data integration Alex Poulovassilis School of Computer Science and.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
DBSQL 9-1 Copyright © Genetic Computer School 2009 Chapter 9 Data Mining and Data Warehousing.
Wrapper-Based Evolution of Legacy Information System Philippe Thiran et al Fcculties University Notre-Dame de la Paix.
Visit to HP Labs, 22/10/2002 Heterogeneous information integration Alex Poulovassilis Database and Web Technologies Group School of Computer Science and.
Advanced Database Concepts
MIS 451 Building Business Intelligence Systems Data Staging.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
MAIME: A Maintenance Manager for ETL Processes
Data Warehouse.
Data Warehouse and OLAP
Data Warehousing Concepts
Data Warehouse and OLAP
Presentation transcript:

Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college, University of London ACM International Workshop on Data Warehousing and OLAP 7 th November 2003

Outline What is AutoMed? Creating AutoMed DW Metadata Using AutoMed DW Metadata Comparison of AutoMed and Conceptual Data Model (CDM) approaches Conclusion

What is AutoMed HDM (Hypergraph Data Model) schemas consist of a set of Nodes, Edges and Constraints Transformation Pathways add/extend delete/contract rename IQL language (See for a technical report on The Automed Intermediate Query Language.)

A Data Integration Example 1. addRel ( >, >); 2. addAtt ( >, >); 3. addAtt ( >, gc sum [(d,s)|(i,j,s) >; (i',j',d) >; i=i'; j=j']); 4. delEdge ( >, >); 5. delNode ( >,[n|(d,n) >); 6. delNode ( >, >); 7. contractHierarchy ( >); 8. contractHierarchy ( >); 9. contractAtt ( >); 10. contractAtt ( >); 11. contractFact ( >); 12. contractAtt ( >); 13. contractDim ( >); 14. contractAtt ( >); 15. contractDim ( >);

Data Transformation/Integration

Creating AutoMed Metadata Create Automed metadata repository Any DBMS supporting JDBC Specify data models All data Models used in DW schemas e.g., RDB, XML, Multi-Dim, etc. Extract data source schemas Define transformation pathways Manually Automatically

Creating AutoMed Transformation Pathways 1)Transforming 2)Single-source cleansing 3)Multi-source cleansing 4)Integration 5)Summarizing 6)Creating data marts AutoMed Transformation Pathways can be used for the following data warehousing activities:

Data Cleansing adds a new construct `temp to the schema, whose extent consists of clean data; adds a new construct `temp to the schema, whose extent consists of clean data; contracts the dirty construct, C, which is being cleaned contracts the dirty construct, C, which is being cleaned adds a new construct, C, derived from the the data in`temp ; adds a new construct, C, derived from the the data in`temp ; deletes or contracts the `temp construct. deletes or contracts the `temp construct. The general pathway used for Data Cleansing:

Single-source Cleansing Person (id, name, address, zip, city, country) addRel ( >, toolCall 'QuickAddress Batch' ' >' ' ' ' >'); contractAtt ( >); addAtt ( >, [(i,z)|(i,a,z) >]); addAtt ( >, [(i,a)|(i,a,z) >]); deleteRel ( >, [(i,a,z)|(i,a) >; (i',z) >;i=i']);

Multi-source Cleansing Person (id, maritalStatus) Emp (id, name, maritalStatus) addAtt ( >, >-- [(i,s)|(i,s) >; (i',s') >; i = i'; not (s = s')]); contractAtt ( >); renameAtt ( >, >);

Using AutoMed Metadata Incremental View Maintenance Data Lineage Tracing

Using AutoMed Metadata for IVM Incremental View Maintenance S GS D V TP = tp 1, …, tp r 1 1 tp 1 i i tp 2, …, tp i r r tp i+1, …, tp r See H. Fan. Incremental view maintenance and data lineage tracing in heterogeneous database environments. In proc. BNCOD02 PhD Summer school, Sheffied, 2002.

Using AutoMed Metadata for DLT Data Lineage Tracing Algorithms Fully Materialized Pathway Fully Virtual Pathway Partially Materialized Pathway Data Lineage Affect-Pool Origin-Pool DLT formulae q s AP (t) q s OP (t) See H. Fan and A. Poulovassilis. Tracing data lineage using schema transformation pathways. In knowledge Transformation for the Semantic Web, IOS Press, 2003.

AutoMed vs. CDM approach

Discussion Semantic mismatches Tightly coupled with the CDM Not straightforward to reuse the integration effort if a source schema is changed No semantic mismatch Possible to extend data warehouse views into a different data model Easily reuse the trans- formation and integration efforts if a source schema is changed - see Section 5 of the paper Conceptual Data Model:AutoMed:

Conclusion AutoMed metadata can be used for expressing data warehousing activities, including data cleansing; AutoMed metadata can be used for incrementally maintaining the DW data and data lineage tracing; Compared with CDM, AutoMed has several advantages; In contrast to commercial ETL tools, AutoMed metadata provides sufficient information for IVM and DLT. Limitations: Not all data warehouse metadata can be captured by AutoMed Currently, transformation pathways are created manually. However, we are investigating automatic/semi-automatic generation techniques

Acknowledge Thank you!