Visit to HP Labs, 22/10/2002 Heterogeneous information integration Alex Poulovassilis Database and Web Technologies Group School of Computer Science and.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

DIMNet Workshop 7 & 8/10/2002 AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College)
Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college,
RDFTL: An Event-Condition- Action Language for RDF George Papamarkos Alexandra Poulovassilis Peter T. Wood School of Computer Science and Information Systems.
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis.
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
19 January 2007 Data Quality Meeting Alex Poulovassilis.
October 2007 Data integration architectures and methodologies for the Life Sciences Alexandra Poulovassilis, Birkbeck, U. of London.
SeLeNe Kick-off Meeting 15-16/11/2002 SeLeNe-related Research At Birkbeck Alex Poulovassilis and Peter T.Wood Database and Web Technologies Group School.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Automatic Data Ramon Lawrence University of Manitoba
Semantic Mediation & OWS 8 Glenn Guempel
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
February Semantion Privately owned, founded in 2000 First commercial implementation of OASIS ebXML Registry and Repository.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Database Design - Lecture 2
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
PART IV: REPRESENTING, EXPLAINING, AND PROCESSING ALIGNMENTS & PART V: CONCLUSIONS Ontology Matching Jerome Euzenat and Pavel Shvaiko.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
Brian Matthews, DeFINE, Pisa 26/11/02 Trust and the Semantic Web Brian Matthews, Business & Information Technology Dept, CLRC
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.
Chapter 7 System models.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Aberdeen, 28/1/2003 AutoMed: Automatic generation of Mediator tools for heterogeneous data integration Alex Poulovassilis School of Computer Science and.
Algorithmic Detection of Semantic Similarity WWW 2005.
Rainbow: XML and Relational Database Design, Implementation, Test, and Evaluation Project Members: Tien Vu, Mirek Cymer, John Lee Advisor:
XML and Database.
An approach for Framework Construction and Instantiation Using Pattern Languages Rosana Teresinha Vaccare Braga Paulo Cesar Masiero ICMC-USP: Institute.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
LeGE WS 16 th December 2002 SeLeNe : Self e-Learning Networks Alex Poulovassilis, Birkbeck, Univ. of London One-year Accompanying Measure for IST V.1.9.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Sharing personal knowledge over the Semantic Web ● We call personal knowledge the knowledge that is developed and shared by the users while they solve.
Developing GRID Applications GRACE Project
1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
DATA MODELS.
Knowledge Management Systems
Data Model.
PASSI (Process for Agent Societies Specification and Implementation)
Presentation transcript:

Visit to HP Labs, 22/10/2002 Heterogeneous information integration Alex Poulovassilis Database and Web Technologies Group School of Computer Science and Information Systems Birkbeck, University of London

Visit to HP Labs, 22/10/2002 Research in CS & IS at Birkbeck  Main groups: Database and Web Technologies Computational Intelligence Bioinformatics Software Engineering  Main research funding sources: EPSRC, BBSRC, EU, Wellcome Trust, HEFCE, industry  URL

Visit to HP Labs, 22/10/2002 Teaching in CS & IS at Birkbeck  Foundation Degree in IT (part-time)  BSc Computing (pt)  BSc Information Systems and Management (pt)  MSc Computing Science (ft and pt)  MSc in Advanced Information Systems (ft and pt)  MRes in Computer Science (ft and pt)  MPhil/PhD in Computer Science (ft and pt)  URL

Visit to HP Labs, 22/10/2002 Schema Integrated Schema

Visit to HP Labs, 22/10/2002 Background  In earlier work with Peter McBrien (ER’97, IS’98, DKE’98) we developed a new framework to support transformation and integration of heterogeneous database schemas.  Our framework consisted of: a new notion of schema equivalence a set of primitive schema transformations which can be composed to define unconditional or conditional equivalences between schemas

Visit to HP Labs, 22/10/2002 Background  We represent the modelling constructs of higher-level data models (e.g. relational, object-oriented, semi-structured, XML) in terms of a hypergraph data model (HDM)  The HDM common data model provides a unifying semantics for such higher-level modelling constructs

Visit to HP Labs, 22/10/2002 Background  Our schema transformations allow constructs from different modelling languages to be mixed within the same intermediate schema (CAiSE’99)  Our schema transformations are automatically reversible, setting up a two-way transformation pathway between pairs of schemas:

Visit to HP Labs, 22/10/2002

addClass Series [p|(p,S)  category] addClass Doc [p|(p,D)  category] addClass Film [p|(p,F)  category] addClass Prog [p|(p,c)  category]

Visit to HP Labs, 22/10/2002 addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)  category] addClass Doc [p|(p,D)  category] addClass Film [p|(p,F)  category] addClass Prog [p|(p,c)  category]

Visit to HP Labs, 22/10/2002 addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S)  category] addClass Doc [p|(p,D)  category] addClass Film [p|(p,F)  category] addClass Prog [p|(p,c)  category] delRel category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series]

Visit to HP Labs, 22/10/2002 delSubClass Film Prog delSubClass Doc Prog delSubClass Series Prog delClass Series [p|(p,S)  category] delClass Doc [p|(p,D)  category] delClass Film [p|(p,F)  category] delClass Prog [p|(p,c)  category] addRel category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series]

Visit to HP Labs, 22/10/2002 addConstraint subset Film Prog addConstraint subset Doc Prog addConstraint subset Series Prog addNode Series [p|(p,S)  category] addNode Doc [p|(p,D)  category] addNode Film [p|(p,F)  category] addNode Prog [p|(p,c)  category] delEdge category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series] delNode Programme Prog delNode Category [F,D,S]

Visit to HP Labs, 22/10/2002 delConstraint subset Film Prog delConstraint subset Doc Prog delConstraint subset Series Prog delNode Series [p|(p,S)  category] delNode Doc [p|(p,D)  category] delNode Film [p|(p,F)  category] delNode Prog [p|(p,c)  category] addEdge category [(p,F)|p  Film] U [(p,D)|p  Doc] U [(p,S)|p  Series] addNode Programme Prog addNode Category [F,D,S]

Visit to HP Labs, 22/10/2002 Query and Data Translation  These pathways can thus be used to automatically translate data and queries between schemas (ER’99)  From a pathway T:S –> S’ we: compose the queries in the add steps to derive a definition of each construct in S’ as a view over S, and compose the queries in the del steps to derive a definition of each construct in S as a view over S’  These view definitions can then be used to automatically translate data and queries between S and S’

Visit to HP Labs, 22/10/2002 Both-As-View integration  Our schema transformation pathways capture at least the information available from global-as-view (GAV) or local-as-view (LAV)  We discuss this in a forthcoming paper (ICDE’03) and term our integration approach both-as-view (BAV)  In particular, we discuss how GAV and LAV view definitions can be derived from a BAV specification a BAV specification can be partially derived from a set of GAV or LAV view definitions

Visit to HP Labs, 22/10/2002 Schema Evolution  Unlike GAV and LAV, our framework readily supports the evolution of both local and global schemas (CAiSE’02, ICDE’03)  The first step is to define the evolution of the global or local schema as a schema transformation pathway from the old to the new schema  There is then a systematic way of evolving, as opposed to re- generating, the transformation pathways – and perhaps the global schema in the case of a local schema evolution

Visit to HP Labs, 22/10/2002 Schema Evolution  In particular (see CAiSE’02 and ICDE’03 for details): if the evolved schema is semantically equivalent to the original schema, then the transformation network can be repaired automatically if the evolved schema is a contraction of the original schema, the transformation network can again be repaired automatically if the evolved schema is an extension of the original schema, then domain knowledge may be required (but again the network is evolved rather than regenerated)

Visit to HP Labs, 22/10/2002 The AutoMed Project (funded by EPSRC, at Birkbeck and Imperial College)  The aims of the AutoMed project are to investigate: how our theoretical framework can be practically applied real data integration problems how much of a mediator’s global query processing functionality can be automatically generated from our transformation pathways evolutionary and heuristic techniques for schema improvement and global query optimisation

Visit to HP Labs, 22/10/2002 AutoMed Architecture Global Query Processor Global Query Optimiser Schema Evolution Tool Schema Transformation and Integration Tool Model Definition Tool Schema and Transformation Repository Model Definitions Repository

Visit to HP Labs, 22/10/2002 Query Processing and Optimisation  We are handling query language heterogeneity by translation into/from a functional intermediate query language – IQL; Edgar Jasper  A query Q expressed in a high-level query language on a global schema S is first translated into IQL  GAV view definitions are derived from the transformation pathways from the local schemas to S, and are used to reformulate the query into an IQL query over the local schema constructs  A LAV query processing approach would also be possible

Visit to HP Labs, 22/10/2002 Query Processing and Optimisation  Query optimisation and query evaluation then occur  Specific issues for query optimisation in AutoMed include: optimising the view definitions derived from the transformation pathways, and handling heterogeneous modelling constructs appearing within these view definitions  For query evaluation, wrappers will undertake translation of IQL sub- queries into the local query language, and translation of results back into the IQL type system. Further post-processing is possible.

Visit to HP Labs, 22/10/2002 XML Data Sources  As well as integration of structured data sources, we have done some preliminary work on translating and integrating XML data CAiSE’01)  We have defined a representation of XML in terms of the nodes, edges and constraints of the HDM  We capture the ordering of XML elements by an order node and a hyperedge to it from the edge representing the parent-child relationship

Visit to HP Labs, 22/10/2002 Translating XML into HDM root customername numberaccount order

Visit to HP Labs, 22/10/2002 XML Data Sources  We have also defined a set of primitive transformations on XML (in terms of the underlying transformations on the equivalent HDM representation)  XML documents are then translated into a simple ER representation, which allows them to be integrated with each other and with other structured data sources  The above has been implemented by Tanvir Faqueer  He is now looking at automatic or semi-automatic transformation and integration of the ER models arising from XML documents

Visit to HP Labs, 22/10/2002 Unstructured Text Sources  We are also working on extracting structure from unstructured text sources – Dean Williams  The aim here is to integrate information extracted from unstructured text with structured or semi-structured information available from other sources  We are using existing IE technology (the GATE tool) for text annotation. Natural language and domain ontologies will extend these annotations  The extracted information will be matched with existing information in the to derive new facts and perhaps new global schema constructs

Visit to HP Labs, 22/10/2002 Materialised integration  As well as virtual integration of data sources, we are also investigating using the AutoMed framework for materialised integration i.e. a data warehousing approach  In particular, we are looking at incremental view maintenance and data lineage tracing using the AutoMed schema transformation pathways – Hao Fan

Visit to HP Labs, 22/10/2002 Event-Condition-Action Rules for XML  XML is becoming a standard means of storing and exchanging information on the Web  XML repositories are increasingly being used in dynamic applications where actions need to be taken in a timely fashion in response to updates to the data  Periodic querying is not sufficient – may be too infrequent, or too frequent  Thus, there is a need for reactive functionality on XML repositories: event-condition-action (ECA) rules are a natural candidate

Visit to HP Labs, 22/10/2002 ECA Rules for XML  ECA rules take the form: on event if condition do action Users/ Apps Event Detection Action Execution Condition Evaluation

Visit to HP Labs, 22/10/2002 ECA Rules for XML  We are currently developing an ECA rule language for XML, with James Bailey and Peter Wood (WWW’2002): ON INSERT path | DELETE path IF condition DO INSERT subdocument BELOW path | DELETE path

Visit to HP Labs, 22/10/2002 SeLeNe – Self e-Learning Networks (EU FP5)  We are planning to extend this work to ECA rules on RDF, as part of the SeLeNe project  SeLeNe is a technical feasibility study in using Semantic Web technology for dynamically integrating metadata from heterogeneous and autonomous learning resources, and for creating personalised views over this Knowledge Grid.  ECA rules will be used for incremental maintenance of derived learning objects defined as views over source learning objects