Making Peer Databases Interact – A Vision for an Architecture Supporting Data Coordination Working Group (in alph. order): Bernstein Phil (4) Kementsietsidis.

Slides:



Advertisements
Similar presentations
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
Advertisements

May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Company Confidential 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Towards a mobile content delivery network with a P2P architecture Carlos Quiroz.
PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems.
Peer-to-Peer Networking By: Peter Diggs Ken Arrant.
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
Overview Distributed vs. decentralized Why distributed databases
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
Implementing Database Coordination in P2P Networks * Ilya Zaihrayeu SemPGRID-04, 18 May 2004, New York, USA * work with Fausto Giunchiglia.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
An expert system is a package that holds a body of knowledge and a set of rules on a subject that has been gained from human experts. An expert system.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Peer-to-Peer Databases David Andersen Advanced Databases.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
P2P File Sharing Systems
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
For more notes and topics visit:
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Cli/Serv.: JXTA/151 Client/Server Distributed Systems v Objective –explain JXTA, a support environment for P2P services and applications ,
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Communicating over the Network Network Fundamentals – Chapter 2.
DNS (Domain Name System) Protocol On the Internet, the DNS associates various sorts of information with domain names. A domain name is a meaningful and.
Chapter 8 Architecture Analysis. 8 – Architecture Analysis 8.1 Analysis Techniques 8.2 Quantitative Analysis  Performance Views  Performance.
Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
DNS based IP NetLocation Service China Telecom Guangzhou Institute
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources1 Fausto Giunchiglia, University of Trento Coordinating Peer-to-Peer information sources.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Andrew S. Budarevsky Adaptive Application Data Management Overview.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
1 Chapter 1 Introduction to Databases Transparencies.
Distributed database system
An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006.
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Object storage and object interoperability
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Speaker : 童耀民 MA1G /3/21 1 Authors: Phone Lin and Pai-Chun Chung, National Taiwan University Yuguang Fang, University of Florida.
Doc.: IEEE /0085r1 Submission June 2010 Tuncer Baykas, NICTSlide TG1 and System Design Document Notice: This document has been prepared.
Network Topologies for Scalable Multi-User Virtual Environments Lingrui Liang.
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
CHAPTER 3 Architectures for Distributed Systems
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Mobile Computing.
Internet Protocols IP: Internet Protocol
Working Group (in alph. order): Bernstein Phil (4)
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Computer Networks Protocols
Presentation transcript:

Making Peer Databases Interact – A Vision for an Architecture Supporting Data Coordination Working Group (in alph. order): Bernstein Phil (4) Kementsietsidis Tasos (2) Kuper Gabriel (1) Mylopoulos John (2) Serafini Luciano (3) Shvaiko Pavel (1) Zaihrayeu Ilya (1) Sites: (1)University of Trento (2)University of Toronto (3)ITC-Irst, Trento (4)Microsoft Research Fausto Giunchiglia (1) Madrid, 20 September 2002

2 The Talk Peer-to-Peer Databases – The intuition Preliminary Logical Architecture The Running Example Conclusion … and Agents???

3 PEER-TO-PEER DATABASES – THE INTUITION

4 The Peer-to-Peer (P2P) “Peer-to-peer is a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must … have significant or total autonomy of central servers” Quote from Clay Shirkey (

5 Examples of P2P Computing Napster – a shared directory of available music and client software which allows, for instance, to import and export files Gnutella – a decentralized group membership and search protocol, mainly used for file sharing Groove – a system which implements a secure shared space among peers JXTA – which aims at creating a common platform that makes it simple and easy to build a wide range of distributed services and applications in which every device is addressable by a peer Is there a place for databases?

6 Motivating Example: Databases of Medical Patients One patient may be described in several databases: pharmacist, family doctor and hospital But the databases can use different patient ID formats, disease descriptions, etc Nevertheless they still may need to interoperate At this point data integration may suffice, if the patient goes to the same doctor, pharmacist and hospital When a patient is injured on a ski holiday in another country, yet more databases need to get involved Complete integration is likely to be infeasible But dynamic integration of databases relevant to one patient could have high value  

7 Data (base) Coordination “... Coordination is managing dependencies between interacting databases” Why is it different from data (base) Integration?  No statically maintained global schema  many of the parameters (metadata) influencing the interaction among peer databases are decided at run time, whereas Integration is made in design time  Change in content of a node does not affect the overall system performance … and  For any given query, nodes coordinate in order to define and use the most “appropriate” (virtual) schema – this is crucial for dealing with the strong dynamics of a P2P network

8 The Three Variances Data integration mechanisms for randomly acquainted databases become impractical We have three kinds of unpredictable run time factors, which influence the answer to a given query in a P2P network:  Network (dependent) variance: the network changes over time  Database (dependent) variance: different databases, if asked the same global query will provide different answers  Query (dependent) variance: different queries, even if posed to the same database, will impose different points of view on the network

9 Good Enough Answers In data coordination, it becomes hard to maintain a high quality level in the answers provided bythe P2P network High quality data can flow among the databases preserving (at the best possible level of approximation) soundness and completeness Good Enough Answer (intuition) – high quality level answer which serves its purposes given the amount of effort made in computing it

10 Example of a Good Enough Answer When planning his vacation in Trentino, John goes to a local travel agency (TA) TA unluckily can not offer John anything from their own database Instead TA searches for single operators in the Trentino region (hotels, ski resorts, etc) TA starts communication sessions with some operators TA queries for the necessary information (e.g., prices, conditions, availability) As long as, for instance, TA gets a hotel John likes, this is Good Enough Compared to the Motivating Example, much lower quality data coordination will probably suffice Cost: 150 $ Avail: 05/01/03 – 15/01/03 Services: …

11 Tuning Coordination Over Time A lot of metadata needs to be produced and maintained Due to the strong dynamics of a P2P network, this is a crucial and hard task to perform because:  A node will never know the full list of its peers  A node will never know everything about its peers  Its knowledge will be hard to maintain and will easily become obsolete There is a need of tuning/improving, on each peer, the quality of the interaction (for instance, with the help of learning algorithms, metadata editors, and so on) There is an obvious trade-off between the quality of the answers and the effort made in maintaining coordination

12 VERY PRELIMINARY HINTS OF A LOGICAL ARCHITECTURE

13 A Proposed Architecture Four basic ingredients: 1. Interest Groups 2. Acquaintances 3. Coordination Rules 4. Correspondence Rules

14 Interest Groups Peer nodes know very little of the other nodes of the P2P network, and about the topics (e.g., Tourism, Medical care, …) their peers are able to answer queries An Interest Group is a set of nodes which are able to answer queries about a certain topic There is a Group Manager (GM) which is in charge of the management of the metadata needed in order to run the group The main goal of GM is to compute the Query Scope (QS) – the set of nodes a query should be propagated to

15 Acquaintances Acquaintances are nodes that a node knows about and that have data relevant to answer specific queries A node is an acquaintance of another node only with respect to (possibly, a schematic representation of) a query There must be a way to compute how to propagate a query, to propagate results back, and to reconcile them with the results coming from the other acquaintances

16 Coordination Rules Each acquaintance may be associated with one or more Coordination Rules coordination rules specify under what conditions, when, how and where to propagate queries or updates A proposed implementation of coordination rules is as Event- Condition-Action (ECA) rules  Event can be an update or a query coming from the user or from another node  Condition refers to properties of the update or query (e.g., the type of query and/or which data are referenced by the query)  Action can be the translation and propagation of a given update or query to a particular acquaintance

17 Correspondence Rules Each acquaintance is associated with one or more Correspondence Rules Correspondence Rules translate queries and query results (semantic heterogeneity) Implemented as rewrite rules and are called by coordination rules, in action and condition components They can be used, for instance to translate attribute or element names (Domain Relations)

18 Level One Architecture P2P Layer P2P functionality’s add-on Local Data Source Database File system Web site … User Interface User queries Results … Query Manager and Update Manager Responsible for query and update propagation Manage coordination and correspondence rules, acquaintances, and interest groups Wrapper provides a translation layer between QM and UM, and LDS

19 A Proposed Strategy for Query Propagation 1. User submits query Q (  ) 2. Node defines query topic 3. Node sends to Group Manager (GM) request to define Query Scope (QS) 4. GM computes and sends back QS 5. Node 1 sends query to acquaintances in QS, and reports this fact to GM 6. Nodes 2 and 4 send answer to node 1 7. Nodes propagate the query to theirs acquaintances from QS and report this fact to GM 8. And so on… 9. Nodes which do not propagate any further, report this fact to GM 10. Propagation stops when “no more propagation” received from all boundary nodes Q (  ) 2. Q ( , topic) 3. QS ( , topic) = ? GM 4. QS ( , topic)= (2, 4, 6, 8, 9, 11) 5. “nodes 2 and 4 are reached” ←Res 2 ←Res 4 “node 6 is reached” “node 8 is reached” “no more propagation from 8” “no more propagation from 9”

20 THE RUNNING EXAMPLE

21 “Toy” Databases Recall Motivating Example: Family Doctor DB  F : Prescription (PatID, P_Name, Illness_Desc, StartDate, RecoveryDate, Treatment, Type, Prescriptions); Hospital DB  H : Patients (PID, Name, Disease, Treatment_Desc, In, Out); Medical Office DB  M : Accidents (P_id, FN, LN, Address_Reason, Treatment_Taken, Prescription_Given, Date) John, who suffers the accident, is described in H with ID “P12”, in F as “8”, and, when addressed to M, he is assigned ID “A13”

22 Query Example Lets suppose Q M is asked to M: Select FN, LN, Address_Reason, Treatment_Taken, Prescription_Given, Date From “M:Accidents” WhereAddress_Reason Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘A13’ With the indication Q M is a global query with topic T = “Medical Care in Canada” After some search T is matched with the topic “Medical Care in Toronto” of the interest group G

23 Group G H is acquainted with F and P is acquainted with F; dashed lines are group metadata channels; H is GM of G GM computes query scope QS = G = { F, H, P } for query Q M M gets acquainted with H M: Accidents and H: Patients are matched As the result a set of Coordination Rules is generated

24 Examples of Coordination Rules Coor # 1  Event: M:Q  Condition: Q:(Address_Reason  Select OR Treatment_Taken  Select) AND (PID = ‘A13’  Where)  Action: Q = Apply (Q, Corr_Rules_Query) Send (Q, H) Coor # 2  Event:M:R H  Condition: None  Action: R M = Apply (R H, Corr_Rules_Results) Where Corr_Rules_Query and Corr_Rules_Results are correspondence rules which translate outgoing query and incoming results

25 Query Propagation P is not reachable because there is no acquaintance graph from M to P In the graph the following queries are circulating:  Q H = SelectName, Disease, Treatment_Desc From“H:Patients” Where Disease Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘P12’  Q F = Select P_Name, Illness_Desc, Treatment From“F:Prescriptions” WhereIllness_Desc Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘8’

26 Results Propagation and Reconciliation H and F generate the following results:  Res H =  Res F = When reached M, the results are reconciled as follows:  Res M =

27 Variance and Good Enough Answers Good Enough answers  Res M is incomplete, some fields from H: Patients and F: Prescription are missing  Nevertheless the results are good enough because they still serve the needs of M Network Variance  If F is down, the results are even more incomplete Database Variance  If M gets acquainted with F instead of H – only Res F is retrievable. F has a different “vision” of the world, as it is not acquainted with H Query Variance  If in Q M ID of John is substituted by ID of another, not shared patient, then no Coordination Rules and therefore no propagation

28 Conclusion First investigation of how to make databases interact in a P2P network. There are four main dimensions:  We must integrate data coming from autonomous, most often semantically heterogeneous, databases;  We must deal with network, database, and query variance. This is why we talk of data coordination, as distinct from data integration;  We will almost never get correct and complete answers. We must be content with answers which are good enough;  There is a need to tune metadata. This is requires in order to cope with the dynamics of a P2P network.

29 References Project website: “Data Management for Peer-to-Peer Computing: A Vision”, WebDB 2002, P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu L. Serafini, F. Giunchiglia, J. Mylopoulos and P. Bernstein “The Local Relational Model: Model and Proof Theory”, tech. rep. IRST, Trento