Aberdeen, 28/1/2003 AutoMed: Automatic generation of Mediator tools for heterogeneous data integration Alex Poulovassilis School of Computer Science and.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Three-Step Database Design
DIMNet Workshop 7 & 8/10/2002 AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College)
Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college,
29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis.
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
19 January 2007 Data Quality Meeting Alex Poulovassilis.
October 2007 Data integration architectures and methodologies for the Life Sciences Alexandra Poulovassilis, Birkbeck, U. of London.
SeLeNe Kick-off Meeting 15-16/11/2002 SeLeNe-related Research At Birkbeck Alex Poulovassilis and Peter T.Wood Database and Web Technologies Group School.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
XML: Extensible Markup Language
Information Integration Using Logical Views Jeffrey D. Ullman.
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
INTEGRATION INTEGRATION Ramon Lawrence University of Iowa
/ faculty of mathematics and informatics TU/e eindhoven university of technology ADBIS'200128/09/20011 An RMM-Based Methodology for Hypermedia Presentation.
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
An Introduction to Description Logics. What Are Description Logics? A family of logic based Knowledge Representation formalisms –Descendants of semantic.
Database Design - Lecture 2
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
ISURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains Prof. Dr. Asuman Dogac METU-SRDC Turkey METU.
Information System Development Courses Figure: ISD Course Structure.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.
Chapter 7 System models.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
XML and Database.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Visit to HP Labs, 22/10/2002 Heterogeneous information integration Alex Poulovassilis Database and Web Technologies Group School of Computer Science and.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
 DATAABSTRACTION  INSTANCES& SCHEMAS  DATA MODELS.
Presentation transcript:

Aberdeen, 28/1/2003 AutoMed: Automatic generation of Mediator tools for heterogeneous data integration Alex Poulovassilis School of Computer Science and Information Systems, Birkbeck AutoMed is a joint project with Peter McBrien (Imperial College), funded under the 2 nd DIM call by EPSRC grants GR/N38107 and GR/N35915

Aberdeen, 28/1/2003 Schema Integrated Schema

Aberdeen, 28/1/2003 Background  In earlier work (ER’97, IS’98, DKE’98) we developed a new framework to support transformation and integration of heterogeneous database schemas.  Our framework consisted of: a new notion of schema equivalence a set of primitive schema transformations which can be composed to define unconditional or conditional equivalences between schemas

Aberdeen, 28/1/2003 Background  In our data integration approach, we represent the modelling constructs of higher-level data models (e.g. relational, object-oriented, semi- structured, XML, RDF) in terms of a low-level hypergraph data model – HDM – whose constructs are nodes, edges and constraints  The HDM common data model provides a unifying semantics for such higher-level modelling constructs  It avoids the semantic mismatches that may occur between constructs of higher-level modelling languages

Aberdeen, 28/1/2003 Background  Our approach allows constructs from different modelling languages to be mixed within the same intermediate schema during the schema transformation/integration process (CAiSE’99)  Our schema transformations are automatically reversible, setting up a two-way transformation pathway between pairs of schema

Aberdeen, 28/1/2003

addClass Series [p|(p,S)  category] addClass Doc [p|(p,D)  category] addClass Film [p|(p,F)  category] addClass Prog [p|(p,c)  category]

Aberdeen, 28/1/2003 addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)  category] addClass Doc [p|(p,D)  category] addClass Film [p|(p,F)  category] addClass Prog [p|(p,c)  category]

Aberdeen, 28/1/2003 addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)  category] addClass Doc [p|(p,D)  category] addClass Film [p|(p,F)  category] addClass Prog [p|(p,c)  category] delRel category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series]

Aberdeen, 28/1/2003 delSubClass Film Prog delSubClass Doc Prog delSubClass Series Prog delClass Series [p|(p,S)  category] delClass Doc [p|(p,D)  category] delClass Film [p|(p,F)  category] delClass Prog [p|(p,c)  category] addRel category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series]

Aberdeen, 28/1/2003 addConstraint subset Film Prog addConstraint subset Doc Prog addConstraint subset Series Prog addNode Series [p|(p,S)  category] addNode Doc [p|(p,D)  category] addNode Film [p|(p,F)  category] addNode Prog [p|(p,c)  category] delEdge category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series] delNode Programme Prog delNode Category [F,D,S]

Aberdeen, 28/1/2003 delConstraint subset Film Prog delConstraint subset Doc Prog delConstraint subset Series Prog delNode Series [p|(p,S)  category] delNode Doc [p|(p,D)  category] delNode Film [p|(p,F)  category] delNode Prog [p|(p,c)  category] addEdge category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series] addNode Programme Prog addNode Category [F,D,S]

Aberdeen, 28/1/2003 Background  These pathways can be used to automatically translate data and queries between pairs of schemas (ER’99)  From a pathway T:S –> S’ we: compose the queries in the add steps to derive a definition of each construct in S’ as a view over S, and compose the queries in the del steps to derive a definition of each construct in S as a view over S’

Aberdeen, 28/1/2003 Background  Thus Prog = [p | (p,c)  category] Film = [p|(p,F)  category] Doc = [p|(p,D)  category] Series = [p|(p,S)  category] and category = [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series]  These view definitions can then be used to automatically translate data and queries between S and S’

Aberdeen, 28/1/2003 Overview of the AutoMed Project  The AutoMed project aims to investigate: how our theoretical framework can be practically applied real data integration problems how much of a mediator’s global query processing functionality can be automatically generated from our transformation pathways evolutionary and heuristic techniques for schema improvement and global query optimisation

Aberdeen, 28/1/2003 The AutoMed Architecture Global Query Processor Global Query Optimiser Schema Evolution Tool Schema Transformation and Integration Tool Model Definition Tool Schema and Transformation Repository Model Definitions Repository

Aberdeen, 28/1/2003 Schema Transformation/Integration Networks in AutoMed US1US2USiUSn LS1LS2LSiLSn GS id … … … …

Aberdeen, 28/1/2003 Schema Transformation/Integration Networks in AutoMed  On the previous slide: GS is a global schema LS1, …, LSn are local schemas US1, …, USn are union-compatible schemas the transformation pathways between each pair LSi and USi may consist of add, delete, rename, expand and contract primitive transformation, operating on any modelling construct defined in the AutoMed Model Definitions Repository the transformation pathway between USi and GS is similar the transformation pathway between each pair of union-compatible schemas consists of id transformation steps

Aberdeen, 28/1/2003 Both-As-View integration  Our schema transformation pathways capture at least the information available from global-as-view (GAV) or local-as-view (LAV)  We discuss this in a forthcoming paper (ICDE’03) and term our integration approach both-as-view (BAV)  In particular, we discuss how GAV and LAV view definitions can be derived from a BAV specification a BAV specification can be partially derived from a set of GAV or LAV view definitions

Aberdeen, 28/1/2003 Schema Evolution  Unlike GAV and LAV, our framework readily supports the evolution of both local and global schemas  The first step is to define the evolution of the global or local schema as a schema transformation pathway from the old to the new schema  There is then a systematic way of evolving, as opposed to re- generating, the transformation pathways  In the case of a local schema evolution, the global schema may also be evolved

Aberdeen, 28/1/2003 Schema Evolution  In particular (see our CAiSE’02 and ICDE’03 papers for details): if the evolved schema is semantically equivalent to the original schema, then the transformation network can be repaired automatically if the evolved schema is a contraction of the original schema, the transformation network can again be repaired automatically if the evolved schema is an extension of the original schema, then domain knowledge may be required (but again the network can be evolved rather than regenerated)

Aberdeen, 28/1/2003 Global Query Processing  We are handling query language heterogeneity by translation into/from a functional intermediate query language – IQL; Edgar Jasper (BNCOD’02 poster, BNCOD’02 summer school paper)  A query Q expressed in a high-level query language on a global schema GS is first translated into IQL  GAV view definitions are derived from the transformation pathways between GS and the local schemas  These view definitions are substituted into Q, reformulating it into an IQL query over local schema constructs

Aberdeen, 28/1/2003 Global Query Processing  Query optimisation and query evaluation then occur  Specific issues for query optimisation in AutoMed are: optimising the view definitions derived from the transformation pathways, and handling heterogeneous modelling constructs appearing within these view definitions  For query evaluation, wrappers translate IQL sub-queries into the local query language, and translate results back into the IQL type system.  Further query post-processing is possible.

Aberdeen, 28/1/2003 Why a Functional Language as the AutoMed Intermediate Query Language ?  Compositionality: operators can be composed to an arbitrary level of nesting within a query provided the types of the operators are respected by the expressions passed to them  Referential transparency: any query evaluates to a single answer, irrespective of the order of evaluation of its sub-expressions  These properties make view generation, query reformulation and query rewriting simpler than it would be with imperative or logic notations

Aberdeen, 28/1/2003 Why a Functional Language as the AutoMed Intermediate Query Language ?  Natural support for collection types and aggregation operators  Makes this a natural formalism for translating into/out of other query languages e.g. OQL is a functional query language SQL can be considered to be a restriction of OQL XQuery has a functional core language other languages for semi-structured and RDF data are also functional (UnQL, YATL, RQL)

Aberdeen, 28/1/2003 Why a Functional Language as the AutoMed Intermediate Query Language ?  Aggregation operators over collection types such as sets, bags and lists are generalised by a single fold function (Buneman, Tannen, Naqvi, 1990s)  Optimisation techniques have been developed for fold which are applicable to all functional query languages with this formalism at their core (e.g. work by Wadler, Wong, Fegaras, Grust, Poulovassilis & Small)  We plan to leverage these techniques, and perhaps even existing software, for global query optimisation in AutoMed

Aberdeen, 28/1/2003 XML Data Sources  As well as integration of structured data sources, we have done some work on translating and integrating XML data – see our CAiSE’01 paper  We have defined a representation of XML in terms of the nodes, edges and constraints of the HDM  We capture the ordering of XML elements by an order node and a hyperedge to it from the edge representing the parent-child relationship

Aberdeen, 28/1/2003 Translating XML into HDM root customername numberaccount order

Aberdeen, 28/1/2003 XML Data Sources  We have defined a set of primitive transformations on XML, in terms of the underlying transformations on the equivalent HDM representation (which is the general AutoMed methodology)  XML documents are then translated into a simple ER representation, which allows them to be integrated with each other and with other structured data sources  One possible direction of further work is automatic or semi-automatic transformation and integration of the ER models arising from XML documents

Aberdeen, 28/1/2003 Unstructured Text Sources  We have also been working on extracting structure from unstructured text sources – Dean Williams  The aim here is to integrate information extracted from unstructured text with structured or semi-structured information available from other sources  We are using existing technology (the GATE tool) for the text annotation and IE part of this work

Aberdeen, 28/1/2003 Unstructured Text Sources  Natural language and domain ontologies will be used extend these annotations  These will be imported into RDF repositories, and we have extended AutoMed to encompass RDF and RDFS data sources  The information extracted from the text will be matched with existing structured information to derive new facts and perhaps new schema information as well

Aberdeen, 28/1/2003 Materialised integration  Finally, as well as virtual integration of data sources, we are also investigating using the AutoMed framework for materialised integration i.e. a data warehousing approach  In particular, we are looking at incremental view maintenance and data lineage tracing using the AutoMed schema transformation pathways – Hao Fan

Aberdeen, 28/1/2003 Ongoing AutoMed Work at Imperial  Automatic generation of equivalences between different data models  A graphical schema & transformations editor  Data mining techniques for extracting relational schema equivalences  Using AutoMed for integrating semi-structured and structured data, in particular genomic data  Optimising schema transformation pathways