1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.

Slides:



Advertisements
Similar presentations
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
Advertisements

CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
Small-world Overlay P2P Network
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
On Reducing Communication Cost for Distributed Query Monitoring Systems. Fuyu Liu, Kien A. Hua, Fei Xie MDM 2008 Alex Papadimitriou.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
P2P Course, Structured systems 1 Introduction (26/10/05)
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
Rutgers University Relational Algebra 198:541 Rutgers University.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Database Management 9. course. Execution of queries.
Event Management & ITIL V3
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
HEPTOX 1 : Marrying XML and Heterogeneity in Your P2P Databases Angela Bonifati (Icar CNR, Italy), Elaine Q.Chang, Laks V.S.Lakshmanan, Terence Ho, Rachel.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Chapter 13 Artificial Intelligence and Expert Systems.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
The Volcano Optimizer Generator Extensibility and Efficient Search.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Peer to Peer Network Design Discovery and Routing algorithms
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
OntoZilla: An Ontology-based, Semi-structured, and Evolutionary P2P Network for Information Systems and Services 指導教授:李官陵 學 生:陳建博 蔡英傑
Of 24 lecture 11: ontology – mediation, merging & aligning.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Distributed Databases
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra.
Implementing Mapping Composition
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Deterministic and Semantically Organized Network Topology
Chen Li Information and Computer Science
CENG 351 File Structures and Data Managemnet
Materializing Views With Minimal Size To Answer Queries
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Presentation transcript:

1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

2 Preliminaries Datalog  Q(x) :- Airport(x, Vancouver) Mapping for heterogeneous schemas  Correspondences between two schemas  A media for exchanging data, transferring queries, etc PDMS (Peer Data Management System)  Each peer has a database  Peer can leave or join the network voluntarily  Mappings between some peers are provided CodeCity SEASeattle YVRVancouver Airport: headbody

3 A general query answering case in PDMS Local Database UBC Local Database UW Local Schema UBC Local Schema UW Local Schema UT Local Database UT Mapping UBC_UW Mapping UW_UT

4 A general query answering case in PDMS Local Database UBC Local Database UW Local Schema UBC Local Schema UW Local Schema UT Local Database UT Mapping UBC_UW Mapping UW_UT Query Q over UBC Query Q’ over UW Query Q” over UT

5 Previous methods can only access in the local schema Assume relation: conf-paper(title, venue, year, pages) Local Database UW Local Database UBC Local Schema UW Local Schema UBC Mapping UW_UBC Assume relation: conf-paper(title, venue, year, URL) Query that a UW user can ask: q(x) :- conf-paper(t, v, y, x). He can never ask information about URL !!!

6 What we ’ d like to improve … Want to access more information, e.g. url Get rid of the restrictive query format, e.g. local schema only Improve the comprehensibility of the PDMS Reconsider the difficulties and complexity raised by mapping composition Make good use of indirect mapping information We have a method for mediated schema creation in PDMS that solves all of these

7 Challenges How to create the mediated schema without a centralized authority? How to result in the same mediated schema wherever mediation starts? How can an automatically created mediated schema be comprehensible to users? How can human intervention be minimized? Where to store the mediated schema, and how to update it?

8 Related Work Bernstein et al.: a vision to incorporate the database research into the P2P scenario Piazza project: provides a complete prototype for query answering in PDMS Fagin et al.: use SO logic as mapping language HePToX: XQuery reformulation Hyperion: uses both data-level and schema-level mappings to specify the correspondences between acquainted peers PeerDB: use keywords as the basis for relation matching

9 Outline Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A Study of Mapping composition Experimental Study

10 Introducing concept into conjunctive mappings A conjunctive mapping is in the following form: conf-paper(title,venue,yr) :- UW.conf-paper(title,venue,yr,pages) conf-paper(title,venue,yr) :- UBC.conf-paper(title,venue,yr,URL)  IDB name: “conf-paper”  Component: each DataLog query above is a component  Subgoal: each relation in the body, e.g. “UW.conf-paper(title,venue,yr,pages)”

11 Introducing concept into conjunctive mappings (Cont.) Intuitively, a concept describes the common object across different schemas Informally, two mappings CM 1 and CM 2 have the same concept if:  CM 1 and CM 2 have the same IDB names  Q 1 and Q 2 that are constructed by overlapped subgoals of CM 1 and CM 2 are equivalent  Subgoals should be compatible

12 Introducing concept into conjunctive mappings (Cont.) Mappings that express the same concept:  Mapping 1, from UW to UBC: Paper(title,venue):-UW.paper(title,venue,yr,pages) Paper(title,venue):-UBC.paper(title,venue,author,URL)  Mapping 2, from UBC to UT: Paper(title,author):-UBC.paper(title,venue,author,URL) Paper(title,author):-UT.paper(title,author,area) Mappings that do not express the same concept:  Mapping 1, from A to B Manager(x, y) :- A.Mgr(x, y) Manager(x, y) :- B.Mgr1(x, y)  Mapping 2, from B to C Manager(x) :- B.Mgr1(x, x) Manager(x) :- C.SelfMgr(x) Mapping Compatible Check before merge

13 Outline Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

14 Pottinger ’ s Schema Mediation Algorithm for DIS  Base of our approach Local Database UW Local Database UBC Local Schema UW Local Schema UBC Mapping UW_UBC Mediated Schema M Mapping M_UBCMapping M_UW

15 Peer Schema Mediation – How the system works

16 Schema Mediation Strategy As explained in previous slide Merging two schemas is based on MappingTables

17 MappingTable creation Purpose:  Relate a relation in M for concept with subgoals from mappings  Transform unstructured mapping information to structured forms  Easy to reconstruct original mapping from the MappingTables  Indirect mapping information can easily be represented in MappingTable; hard to do by using mappings  Example:

18 Merge Two MappingTables The MappingTable merging process follows the general principles:  Related attributes should be positioned in the same column  Un-related attributes are in different columns  Overlapping local relations in the two MappingTables are how we determine the indirect mapping information

19 Merge Two MappingTables (Cont.) M3: result of merging M1 and M2

20 Compute GLAV Mappings for Each Local Peer

21

22 Query Reformulation Reformulate Queries in both directions  Q over E  Q’ over M  Q’ over M  Q over E

23 Information that each peer maintains in the system set-up phase Each peer stores:  E’s local database schema  A list of mappings between E and its acquaintances  A current version of mediated schema M  MappingTable set corresponds to M  GLAV mappings from M to E

24 Outline Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

25 Adding a Peer to the Network Some peer builds application over M after system setup phase New peer joins, M will change, how to handle those already-built applications?  Keep transforming info to make old applications still usable (a) Right after the system setup phase (b) Sometime later, D joins…

26 Dropping a Peer from the Network Strategy One: A peer’s leaving the network triggers a schema mediation process from the very beginning  BAD: too much system work assigned for schema mediation only Strategy Two: Re-do the schema mediation once every assigned period  Two ways to know X is leaving: 1. X notifies any other node before departure 2. Other peer PINs or communicates with X  BAD: Previously-created mediated schema will be useless Strategy Three:  X leaves without notifying others  X’s acquaintance Y will recognize X’s leaving  Y compute the new mediated schema  BAD: Y needs to be able to recognize which relation in the MappingTable comes from X Peers can easily lose connection with others

27 Dropping a Peer from the Network (Cont.) Strategy Four: X wants to leave:  X calculates a new mediated schema  X assigns its acquaintance another acquaintance from its acquaintance list  “Removal” operator: given M and X that is to be removed, compute the remaining part  Removing part: can be relations, attributes in relations Good because All previously constructed applications can still be available All peers are still connected No redundant work will be resulted: won’t start from the beginning

28 Information that each peer maintains in the system-steady state Each peer stores the following information:  Local schema  Mappings to its acquaintances  Current mediated schema, MappingTables, and mappings to its own schema  Previous versions of mediated schema that local peer has applications built on it, and mappings to the new mediated schema

29 Outline Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

30 A study of Mapping Composition MePSys only considers input mappings to be:  Mappings with the same Concept  Ignoring such complicated factors as self-join and self-restrictive components Our approach is transferring the problem of mapping composition into another: using the mediated schema to relate different schemas

31 Some facts [Madhavan and Halevy] The number of composed mappings does not depend on the number of the input mappings [Madhavan and Halevy] The composition of finite mappings may result in infinite set of composed mappings [Fagin et al.] The composed mapping of two mappings in first-order logic might not be expressed by first-order logic

32 Analysis for the Study We compared Piazza, SO logic algorithm and MePSys Whether Piazza method is expressive or not depends entirely on whether existential attributes in the second schema are mapped to the third schema The Second-Order logic Mapping Composition algorithm can handle cases with composed non-identical self-join components  However, results are hard to understand MePSys do not handle patterns with self-restrictive  Mappings in such patterns do not support concepts MePSys has yet to realize the mediation of schemas if mappings contain composed non-identical self-join components Aside from these two special groups of patterns, using the mediated schema to relate different sources is decidable.

33 Outline Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

34 System Settings FreePastry  A P2P network layer, using efficient routing strategy  Each node maintains a routing table  Keeps track of its immediate neighbors.  Provides the functionality of notifying applications of message arrival, node failures, etc. Emulab  Network emulation testbed  Access to different machines to emulate nodes in real network  900M memory with MHz processor Input schemas and mappings  Input schema follows TCP-H standard  Avg num of acquaintances per peer  Avg num of relations per peer schema  Avg num of attributes in a relation

35 Experiment 1: Schema Mediation in MePSys

36 Experiment 2: Query Reformulation For queries with similar size (less than 1k), time can be decidable

37 Experiment 2: Query Reformulation (Cont.) In the maximum case, 10 times query reformulation only takes 2% of the total time

38 Experiment 3: Updating the Mediated Schema Computing a new mediated schema always takes less than 2% of the total time Updating almost takes no time

39 Our contributions MePSys, in which a mediated schema is created dynamically and any information in the network can be queried without additional global services Provide an efficient algorithm PSM to create a mediated schema in PDMS and further create mappings to local sources Introduce the idea of automatically detecting specific Concepts in mappings Study on how mapping composition impacts query reformulation with existing approaches Solve the problem of updating the mediated schema Experiment on the efficiency and scalability of MePSys

40 Future Work Explore the semantic issues when a broader range of mappings are considered, i.e., mappings with self-join, mappings with different IDB names, etc More optimization issues to be considered in the future system Design better approach to update the mediated schema for local schema evolution

41 Acknowledgement

42 Thank you! Questions?