Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working Group (in alph. order): Bernstein Phil (4)

Similar presentations


Presentation on theme: "Working Group (in alph. order): Bernstein Phil (4)"— Presentation transcript:

1 Making Peer Databases Interact – A Vision for an Architecture Supporting Data Coordination
Working Group (in alph. order): Bernstein Phil (4) Kementsietsidis Tasos (2) Kuper Gabriel (1) Mylopoulos John (2) Serafini Luciano (3) Shvaiko Pavel (1) Zaihrayeu Ilya (1) Sites: University of Trento University of Toronto ITC-Irst, Trento Microsoft Research Fausto Giunchiglia (1) Madrid, 20 September 2002

2 The Talk Peer-to-Peer Databases – The intuition Preliminary Logical Architecture The Running Example Conclusion … and Agents???

3 PEER-TO-PEER DATABASES –
THE INTUITION

4 The Peer-to-Peer (P2P) “Peer-to-peer is a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must … have significant or total autonomy of central servers” Quote from Clay Shirkey (

5 Examples of P2P Computing
Napster – a shared directory of available music and client software which allows, for instance, to import and export files Gnutella – a decentralized group membership and search protocol, mainly used for file sharing Groove – a system which implements a secure shared space among peers JXTA – which aims at creating a common platform that makes it simple and easy to build a wide range of distributed services and applications in which every device is addressable by a peer Is there a place for databases?

6 Motivating Example: Databases of Medical Patients
One patient may be described in several databases: pharmacist, family doctor and hospital But the databases can use different patient ID formats, disease descriptions, etc Nevertheless they still may need to interoperate At this point data integration may suffice, if the patient goes to the same doctor, pharmacist and hospital When a patient is injured on a ski holiday in another country, yet more databases need to get involved Complete integration is likely to be infeasible But dynamic integration of databases relevant to one patient could have high value

7 Data (base) Coordination
“... Coordination is managing dependencies between interacting databases” Why is it different from data (base) Integration? No statically maintained global schema many of the parameters (metadata) influencing the interaction among peer databases are decided at run time, whereas Integration is made in design time Change in content of a node does not affect the overall system performance … and For any given query, nodes coordinate in order to define and use the most “appropriate” (virtual) schema – this is crucial for dealing with the strong dynamics of a P2P network

8 The Three Variances Data integration mechanisms for randomly acquainted databases become impractical We have three kinds of unpredictable run time factors, which influence the answer to a given query in a P2P network: Network (dependent) variance: the network changes over time Database (dependent) variance: different databases, if asked the same global query will provide different answers Query (dependent) variance: different queries, even if posed to the same database, will impose different points of view on the network

9 Good Enough Answers In data coordination, it becomes hard to maintain a high quality level in the answers provided bythe P2P network High quality data can flow among the databases preserving (at the best possible level of approximation) soundness and completeness Good Enough Answer (intuition) – high quality level answer which serves its purposes given the amount of effort made in computing it

10 Example of a Good Enough Answer
When planning his vacation in Trentino, John goes to a local travel agency (TA) TA unluckily can not offer John anything from their own database Instead TA searches for single operators in the Trentino region (hotels, ski resorts, etc) TA starts communication sessions with some operators TA queries for the necessary information (e.g., prices, conditions, availability) As long as, for instance, TA gets a hotel John likes, this is Good Enough Compared to the Motivating Example, much lower quality data coordination will probably suffice Cost: 150 $ Avail: 05/01/03 – 15/01/03 Services: …

11 Tuning Coordination Over Time
A lot of metadata needs to be produced and maintained Due to the strong dynamics of a P2P network, this is a crucial and hard task to perform because: A node will never know the full list of its peers A node will never know everything about its peers Its knowledge will be hard to maintain and will easily become obsolete There is a need of tuning/improving, on each peer, the quality of the interaction (for instance, with the help of learning algorithms, metadata editors, and so on) There is an obvious trade-off between the quality of the answers and the effort made in maintaining coordination

12 VERY PRELIMINARY HINTS OF A LOGICAL ARCHITECTURE

13 A Proposed Architecture
Four basic ingredients: Interest Groups Acquaintances Coordination Rules Correspondence Rules

14 Interest Groups Peer nodes know very little of the other nodes of the P2P network, and about the topics (e.g., Tourism, Medical care, …) their peers are able to answer queries An Interest Group is a set of nodes which are able to answer queries about a certain topic There is a Group Manager (GM) which is in charge of the management of the metadata needed in order to run the group The main goal of GM is to compute the Query Scope (QS) – the set of nodes a query should be propagated to

15 Acquaintances Acquaintances are nodes that a node knows about and that have data relevant to answer specific queries A node is an acquaintance of another node only with respect to (possibly, a schematic representation of) a query There must be a way to compute how to propagate a query, to propagate results back, and to reconcile them with the results coming from the other acquaintances

16 Coordination Rules Each acquaintance may be associated with one or more Coordination Rules coordination rules specify under what conditions, when, how and where to propagate queries or updates A proposed implementation of coordination rules is as Event-Condition-Action (ECA) rules Event can be an update or a query coming from the user or from another node Condition refers to properties of the update or query (e.g., the type of query and/or which data are referenced by the query) Action can be the translation and propagation of a given update or query to a particular acquaintance

17 Correspondence Rules Each acquaintance is associated with one or more Correspondence Rules Correspondence Rules translate queries and query results (semantic heterogeneity) Implemented as rewrite rules and are called by coordination rules, in action and condition components They can be used, for instance to translate attribute or element names (Domain Relations)

18 Level One Architecture
P2P Layer P2P functionality’s add-on Local Data Source Database File system Web site User Interface User queries Results Query Manager and Update Manager Responsible for query and update propagation Manage coordination and correspondence rules, acquaintances, and interest groups Wrapper provides a translation layer between QM and UM, and LDS

19 A Proposed Strategy for Query Propagation
“no more propagation from 8” “no more propagation from 9” 5. “nodes 2 and 4 are reached” “node 6 is reached” “node 8 is reached” User submits query Q () Node defines query topic Node sends to Group Manager (GM) request to define Query Scope (QS) GM computes and sends back QS Node 1 sends query to acquaintances in QS, and reports this fact to GM Nodes 2 and 4 send answer to node 1 Nodes propagate the query to theirs acquaintances from QS and report this fact to GM And so on… Nodes which do not propagate any further, report this fact to GM Propagation stops when “no more propagation” received from all boundary nodes 3. QS (, topic) = ? GM 4. QS (, topic)= (2, 4, 6, 8, 9, 11) 9 6 2 2. Q (, topic) ←Res2 7 10 1. Q () ←Res4 1 4 11 3 5 8

20 THE RUNNING EXAMPLE

21 “Toy” Databases Recall Motivating Example: Family Doctor DB F: Prescription (PatID, P_Name, Illness_Desc, StartDate, RecoveryDate, Treatment, Type, Prescriptions); Hospital DB H: Patients (PID, Name, Disease, Treatment_Desc, In, Out); Medical Office DB M: Accidents (P_id, FN, LN, Address_Reason, Treatment_Taken, Prescription_Given, Date) John, who suffers the accident, is described in H with ID “P12”, in F as “8”, and, when addressed to M, he is assigned ID “A13”

22 T = “Medical Care in Canada”
Query Example Lets suppose QM is asked to M: Select FN, LN, Address_Reason, Treatment_Taken, Prescription_Given, Date From “M:Accidents” Where Address_Reason Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘A13’ With the indication QM is a global query with topic T = “Medical Care in Canada” After some search T is matched with the topic “Medical Care in Toronto” of the interest group G

23 Group G H is acquainted with F and P is acquainted with F; dashed lines are group metadata channels; H is GM of G GM computes query scope QS = G = {F, H, P} for query QM M gets acquainted with H M: Accidents and H: Patients are matched As the result a set of Coordination Rules is generated

24 Examples of Coordination Rules
Event: M:Q Condition: Q:(Address_Reason  Select OR Treatment_Taken  Select) AND (PID = ‘A13’  Where) Action: Q = Apply (Q, Corr_Rules_Query) Send (Q, H) Coor # 2 Event: M:RH Condition: None Action: RM = Apply (RH, Corr_Rules_Results) Where Corr_Rules_Query and Corr_Rules_Results are correspondence rules which translate outgoing query and incoming results

25 Query Propagation P is not reachable because there is no acquaintance graph from M to P In the graph the following queries are circulating: QH = Select Name, Disease, Treatment_Desc From “H:Patients” Where Disease Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘P12’ QF = Select P_Name, Illness_Desc, Treatment From “F:Prescriptions” Where Illness_Desc Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘8’

26 Results Propagation and Reconciliation
H and F generate the following results: ResH = <’John’, ‘Forearm dislocation’, ‘Bandage’> ResF = <’John’, ‘Leg fracture’, ‘Leg put in plaster’> When reached M, the results are reconciled as follows: ResM =

27 Variance and Good Enough Answers
ResM is incomplete, some fields from H: Patients and F: Prescription are missing Nevertheless the results are good enough because they still serve the needs of M Network Variance If F is down, the results are even more incomplete Database Variance If M gets acquainted with F instead of H – only ResF is retrievable. F has a different “vision” of the world, as it is not acquainted with H Query Variance If in QM ID of John is substituted by ID of another, not shared patient, then no Coordination Rules and therefore no propagation

28 Conclusion First investigation of how to make databases interact in a P2P network. There are four main dimensions: We must integrate data coming from autonomous, most often semantically heterogeneous, databases; We must deal with network, database, and query variance. This is why we talk of data coordination, as distinct from data integration; We will almost never get correct and complete answers. We must be content with answers which are good enough; There is a need to tune metadata. This is requires in order to cope with the dynamics of a P2P network.

29 References Project website: http://www.dit.unitn.it/~p2p/
“Data Management for Peer-to-Peer Computing: A Vision”, WebDB 2002, P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu L. Serafini, F. Giunchiglia, J. Mylopoulos and P. Bernstein “The Local Relational Model: Model and Proof Theory”, tech. rep. IRST, Trento


Download ppt "Working Group (in alph. order): Bernstein Phil (4)"

Similar presentations


Ads by Google