Download presentation
Presentation is loading. Please wait.
Published byOswald Goodwin Modified over 9 years ago
1
Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari
2
Outline P2P Data Networks Why P2P Databases are Different A P2P Database Scenario A logic for P2P Databases Propagation Strategy Architecture and Implementation Issues
3
P2P Data Networks: Basic Notions Node – Database, File System, etc P2P network – Indexed nodes with equal participant rights Services – Query answering – Query, results and update propagation Locality – No global schema, no centralized control – Nodes have only a partial vision of the world Autonomy – Nodes are largely independent of their language and content, etc
4
Roles for P2P DBs? Peers come and go, but must still be able to interoperate. To us, the big question is how to cope with DBs that – are incomplete, overlapping, and mutually inconsistent – dynamically appear and disappear – have limited connectivity. Scenario – Databases of medical patients – Complete integration is likely to be infeasible – But dynamic integration of DBs relevant to one patient could have high value.
5
A Model for P2P Databases Each peer is a node with a database. It exchanges data and services with acquaintances (i.e. other peers). The set of acquaintances changes often, due to – site availability – changing usage patterns Peers are fully autonomous. – No global control or central server.
6
H: Hospital P: Pharmacist D: Doctor A Motivating Scenario A patient may be described in several DBs, which use different patient id formats, disease descriptions, etc. But the databases can use different patient id formats, disease descriptions, etc 1.When a patient is admitted to the hospital, H becomes acquainted with D 2.The acquaintance is dropped when treatment is over 3.When the doctor prescribes a drug, D becomes acquainted with P 4.A patient is injured skiing, so more DBs get involved Ski Clinic
7
Proposal: Local Relational Model (LRM) A logic for P2P data integration Instead of a global schema, each peer has – coordination formulas – each specifies semantic interdependencies between two acquaintances – binary domain relations – each specifies how symbols in one database translate to symbols in an acquaintance’s database. Each expression in a coordination formula is relative to just one participating database Use coordination formulas and domain relations for query and update processing.
8
A Coordination Formula p: pharmacist DB medication(PrescriptionID, PatientID, Prod) d: doctor DB treatment(TreatmentID, PatientID, Description, Type) where type {“hospital”, “home”} ( i:x).A(x) means for all x in the domain of database i, A(x) is true. A coordination formula: ( p:y).( p:z).(p: ( x).medication(x, y, z) d: ( w).treatment(w, y, z, “home”) ) “There’s a row in treatment in the doctor DB for each row in medication in the pharmacist DB”
9
Domain Relation A row in domain relation r ik specifies that value d 1 in DB i corresponds to value d 2 in DB k r ik may be partial r ik,r ki need not be symmetric Example - DB i contains lengths in meters and DB k in kilometers (total but not symmetric) – r ik (x) = roundToClosestK(x) r ik (653)=1, r ik (453)=0 – r ki (x) = x*1000 r ki (1)=1000
10
Queries A query is a coordination formula of the form A(x) i: q(x), where – A(x) is a coordination formula – x has n variables – i is the database against which the query is posed – q is a new n-ary predicate symbol A relational space is a pair where db is a set of DBs and r associates an r ik with each pair of DBs ⊨ A relational space satisfies a coordination formula The answer to a query: {d dom i | ⊨ (( i:x).A(x) i:x=d)}
11
Interpreting a Query A query: ((i:P(x) j:R(y)) k:S(x,y) ) h:q(x,y) Evaluate P,R,S in i,j,k (respectively) Map these results via r ih,r jh,r kh to sets s i,s j,s k And then compute ((s i s j ) s k )
12
P2P Databases: Proposed Solution Coordinate query and update exchange between autonomous DBs using: Coordination Formulas – Specify semantic interdependencies between data from two nodes table to table: Cust Customer column to column: name(Cust) nm(Customer) Binary Domain Relations – Specify how the symbols used in one database translate to symbols used in another database ‘one’ ‘uno’ CAN$1.00 US$0.65 Keep AUTONOMY and COORDINATION, as much as possible
13
What’s New in the Solution? No global schema, no central registry, no form of control No need of system restructuring when new nodes come and old ones go away We do not integrate, we COORDINATE. – Integration is built at design time – coordination happens at runtime
14
Propagation Strategy: Basic notions Acquaintance – Pair of nodes which have coordination formulas and binary domain relations with respect to each other – Acquaintances can exchange data and services Interest Group – Set of nodes with inter-acquaintances between them which have related content Group Manager – Node of an Interest Group, which is dedicated for group and query propagation management – GM has higher requirements for stability, must be permanently active Query Scope – Set of nodes which are supposed to answer a given query. Query Scope is defined by Group Manager
15
15 Query Propagation Strategy 1.User submits query Q ( ) 2.Node defines query topic 3.Node sends to Group Manager (GM) request to define Query Scope (QS) 4.GM computes and sends back QS 5.Node 1 sends query to acquaintances in QS, and reports this fact to GM 6.Nodes 2 and 4 send answer to node 1 7.Nodes propagate the query to theirs acquaintances from QS and report this fact to GM 8.And so on… 9.Nodes which do not propagate any further, report this fact to GM 10.Propagation stops when “no more propagation” received from all boundary nodes 1 2 3 4 6 5 10 8 7 9 11 1. Q ( ) 2. Q ( , topic) 3. QS ( , topic) = ? GM 4. QS ( , topic)= (2, 4, 6, 8, 9, 11) 5. “nodes 2 and 4 are reached” ←Res 2 ←Res 4 “node 6 is reached” “node 8 is reached” “no more propagation from 8” “no more propagation from 9”
16
Implementation Architecture A classic multi-database system, with – A protocol for adding/dropping acquaintances – LRM query processing (domain mapping logic) that can cope with chains of acquaintances – Dynamic approach to materialized view creation Tools to help a user establish an acquaintance
17
Architecture P2P Layer – P2P functionality’s add-on Local Data Source – Database – File system User Interface – User queries – Results Query Manager and Update Manager – Responsible for query and update propagation – Manage coordination and correspondence rules, acquaintances, and interest groups Wrapper – Provides a translation layer between QM and UM, and LDS
18
Summary Why P2P databases are different A P2P database scenario A logic for P2P databases (LRM) – Coordination formulas and domain relations – Query semantics Architecture and implementation issues
19
P2P Databases 19 منابع 1. M.J. Carey, L.M. Haas, P.M. Schwarz, Manish Arya, W.F. Cody, R. Fagin, M. Flickner, A. Luniewski, W. Niblack, D. Petkovic, J. Thomas II, J.H. Williams, E.L. Wimmers: Towards heterogeneous multimedia information systems: The Garlic approach. RIDE-DOM 1995: 124-131. 2. T. Catarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information systems. International J. of Intelligent and Cooperative Info. Sys., 2(4), 375-398, 1993. 3. S. Ceri and J. Widom. Managing semantic heterogeneity with production rules and persistent queues. In Proceedings 19 th VLDB (1993), 108-119. 4. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J.D. Ullman, J. Widom. The TSIMMIS Project: Integration of heterogeneous data sources. 16th Meeting of Information Processing Society of Japan, 1994, 7–18. 5. A. Gupta and J. Widom. Local verification of global integrity constraints in distributed databases. In Proc. ACM SIGMOD Conference, 49-58, 1993.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.