1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.

1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

2 Preliminaries Datalog  Q(x) :- Airport(x, Vancouver) Mapping for heterogeneous schemas  Correspondences between two schemas  A media for exchanging data, transferring queries, etc PDMS (Peer Data Management System)  Each peer has a database  Peer can leave or join the network voluntarily  Mappings between some peers are provided CodeCity SEASeattle YVRVancouver Airport: headbody

3 A general query answering case in PDMS Local Database UBC Local Database UW Local Schema UBC Local Schema UW Local Schema UT Local Database UT Mapping UBC_UW Mapping UW_UT

4 A general query answering case in PDMS Local Database UBC Local Database UW Local Schema UBC Local Schema UW Local Schema UT Local Database UT Mapping UBC_UW Mapping UW_UT Query Q over UBC Query Q’ over UW Query Q” over UT

5 Previous methods can only access in the local schema Assume relation: conf-paper(title, venue, year, pages) Local Database UW Local Database UBC Local Schema UW Local Schema UBC Mapping UW_UBC Assume relation: conf-paper(title, venue, year, URL) Query that a UW user can ask: q(x) :- conf-paper(t, v, y, x). He can never ask information about URL !!!

6 What we ’ d like to improve … Want to access more information, e.g. url Get rid of the restrictive query format, e.g. local schema only Improve the comprehensibility of the PDMS Reconsider the difficulties and complexity raised by mapping composition Make good use of indirect mapping information We have a method for mediated schema creation in PDMS that solves all of these

7 Challenges How to create the mediated schema without a centralized authority? How to result in the same mediated schema wherever mediation starts? How can an automatically created mediated schema be comprehensible to users? How can human intervention be minimized? Where to store the mediated schema, and how to update it?

8 Related Work Bernstein et al.: a vision to incorporate the database research into the P2P scenario Piazza project: provides a complete prototype for query answering in PDMS Fagin et al.: use SO logic as mapping language HePToX: XQuery reformulation Hyperion: uses both data-level and schema-level mappings to specify the correspondences between acquainted peers PeerDB: use keywords as the basis for relation matching

9 Outline Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A Study of Mapping composition Experimental Study

10 Introducing concept into conjunctive mappings A conjunctive mapping is in the following form: conf-paper(title,venue,yr) :- UW.conf-paper(title,venue,yr,pages) conf-paper(title,venue,yr) :- UBC.conf-paper(title,venue,yr,URL)  IDB name: “conf-paper”  Component: each DataLog query above is a component  Subgoal: each relation in the body, e.g. “UW.conf-paper(title,venue,yr,pages)”

11 Introducing concept into conjunctive mappings (Cont.) Intuitively, a concept describes the common object across different schemas Informally, two mappings CM 1 and CM 2 have the same concept if:  CM 1 and CM 2 have the same IDB names  Q 1 and Q 2 that are constructed by overlapped subgoals of CM 1 and CM 2 are equivalent  Subgoals should be compatible

12 Introducing concept into conjunctive mappings (Cont.) Mappings that express the same concept:  Mapping 1, from UW to UBC: Paper(title,venue):-UW.paper(title,venue,yr,pages) Paper(title,venue):-UBC.paper(title,venue,author,URL)  Mapping 2, from UBC to UT: Paper(title,author):-UBC.paper(title,venue,author,URL) Paper(title,author):-UT.paper(title,author,area) Mappings that do not express the same concept:  Mapping 1, from A to B Manager(x, y) :- A.Mgr(x, y) Manager(x, y) :- B.Mgr1(x, y)  Mapping 2, from B to C Manager(x) :- B.Mgr1(x, x) Manager(x) :- C.SelfMgr(x) Mapping Compatible Check before merge

13 Outline Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

14 Pottinger ’ s Schema Mediation Algorithm for DIS  Base of our approach Local Database UW Local Database UBC Local Schema UW Local Schema UBC Mapping UW_UBC Mediated Schema M Mapping M_UBCMapping M_UW

15 Peer Schema Mediation – How the system works

16 Schema Mediation Strategy As explained in previous slide Merging two schemas is based on MappingTables

17 MappingTable creation Purpose:  Relate a relation in M for concept with subgoals from mappings  Transform unstructured mapping information to structured forms  Easy to reconstruct original mapping from the MappingTables  Indirect mapping information can easily be represented in MappingTable; hard to do by using mappings  Example:

18 Merge Two MappingTables The MappingTable merging process follows the general principles:  Related attributes should be positioned in the same column  Un-related attributes are in different columns  Overlapping local relations in the two MappingTables are how we determine the indirect mapping information

19 Merge Two MappingTables (Cont.) M3: result of merging M1 and M2

20 Compute GLAV Mappings for Each Local Peer

22 Query Reformulation Reformulate Queries in both directions  Q over E  Q’ over M  Q’ over M  Q over E

23 Information that each peer maintains in the system set-up phase Each peer stores:  E’s local database schema  A list of mappings between E and its acquaintances  A current version of mediated schema M  MappingTable set corresponds to M  GLAV mappings from M to E

25 Adding a Peer to the Network Some peer builds application over M after system setup phase New peer joins, M will change, how to handle those already-built applications?  Keep transforming info to make old applications still usable (a) Right after the system setup phase (b) Sometime later, D joins…

26 Dropping a Peer from the Network Strategy One: A peer’s leaving the network triggers a schema mediation process from the very beginning  BAD: too much system work assigned for schema mediation only Strategy Two: Re-do the schema mediation once every assigned period  Two ways to know X is leaving: 1. X notifies any other node before departure 2. Other peer PINs or communicates with X  BAD: Previously-created mediated schema will be useless Strategy Three:  X leaves without notifying others  X’s acquaintance Y will recognize X’s leaving  Y compute the new mediated schema  BAD: Y needs to be able to recognize which relation in the MappingTable comes from X Peers can easily lose connection with others

27 Dropping a Peer from the Network (Cont.) Strategy Four: X wants to leave:  X calculates a new mediated schema  X assigns its acquaintance another acquaintance from its acquaintance list  “Removal” operator: given M and X that is to be removed, compute the remaining part  Removing part: can be relations, attributes in relations Good because All previously constructed applications can still be available All peers are still connected No redundant work will be resulted: won’t start from the beginning

28 Information that each peer maintains in the system-steady state Each peer stores the following information:  Local schema  Mappings to its acquaintances  Current mediated schema, MappingTables, and mappings to its own schema  Previous versions of mediated schema that local peer has applications built on it, and mappings to the new mediated schema

30 A study of Mapping Composition MePSys only considers input mappings to be:  Mappings with the same Concept  Ignoring such complicated factors as self-join and self-restrictive components Our approach is transferring the problem of mapping composition into another: using the mediated schema to relate different schemas

31 Some facts [Madhavan and Halevy] The number of composed mappings does not depend on the number of the input mappings [Madhavan and Halevy] The composition of finite mappings may result in infinite set of composed mappings [Fagin et al.] The composed mapping of two mappings in first-order logic might not be expressed by first-order logic

32 Analysis for the Study We compared Piazza, SO logic algorithm and MePSys Whether Piazza method is expressive or not depends entirely on whether existential attributes in the second schema are mapped to the third schema The Second-Order logic Mapping Composition algorithm can handle cases with composed non-identical self-join components  However, results are hard to understand MePSys do not handle patterns with self-restrictive  Mappings in such patterns do not support concepts MePSys has yet to realize the mediation of schemas if mappings contain composed non-identical self-join components Aside from these two special groups of patterns, using the mediated schema to relate different sources is decidable.

34 System Settings FreePastry  A P2P network layer, using efficient routing strategy  Each node maintains a routing table  Keeps track of its immediate neighbors.  Provides the functionality of notifying applications of message arrival, node failures, etc. Emulab  Network emulation testbed  Access to different machines to emulate nodes in real network  900M memory with 2992.787 MHz processor Input schemas and mappings  Input schema follows TCP-H standard  Avg num of acquaintances per peer  Avg num of relations per peer schema  Avg num of attributes in a relation

35 Experiment 1: Schema Mediation in MePSys

36 Experiment 2: Query Reformulation For queries with similar size (less than 1k), time can be decidable

37 Experiment 2: Query Reformulation (Cont.) In the maximum case, 10 times query reformulation only takes 2% of the total time

38 Experiment 3: Updating the Mediated Schema Computing a new mediated schema always takes less than 2% of the total time Updating almost takes no time

39 Our contributions MePSys, in which a mediated schema is created dynamically and any information in the network can be queried without additional global services Provide an efficient algorithm PSM to create a mediated schema in PDMS and further create mappings to local sources Introduce the idea of automatically detecting specific Concepts in mappings Study on how mapping composition impacts query reformulation with existing approaches Solve the problem of updating the mediated schema Experiment on the efficiency and scalability of MePSys

40 Future Work Explore the semantic issues when a broader range of mappings are considered, i.e., mappings with self-join, mappings with different IDB names, etc More optimization issues to be considered in the future system Design better approach to update the mediated schema for local schema evolution

41 Acknowledgement

42 Thank you! Questions?

1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.

Similar presentations

Presentation on theme: "1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.

Similar presentations

Presentation on theme: "1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006."— Presentation transcript:

Similar presentations

About project

Feedback