Working Group (in alph. order): Bernstein Phil (4)

Slides:



Advertisements
Similar presentations
May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.
Advertisements

Peer-to-Peer Networking By: Peter Diggs Ken Arrant.
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
Implementing Database Coordination in P2P Networks * Ilya Zaihrayeu SemPGRID-04, 18 May 2004, New York, USA * work with Fausto Giunchiglia.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
An expert system is a package that holds a body of knowledge and a set of rules on a subject that has been gained from human experts. An expert system.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Peer-to-Peer Databases David Andersen Advanced Databases.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
P2P File Sharing Systems
For more notes and topics visit:
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Cli/Serv.: JXTA/151 Client/Server Distributed Systems v Objective –explain JXTA, a support environment for P2P services and applications ,
DNS (Domain Name System) Protocol On the Internet, the DNS associates various sorts of information with domain names. A domain name is a meaningful and.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
DNS based IP NetLocation Service China Telecom Guangzhou Institute
AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources1 Fausto Giunchiglia, University of Trento Coordinating Peer-to-Peer information sources.
Andrew S. Budarevsky Adaptive Application Data Management Overview.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
1 Chapter 1 Introduction to Databases Transparencies.
Distributed database system
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
Making Peer Databases Interact – A Vision for an Architecture Supporting Data Coordination Working Group (in alph. order): Bernstein Phil (4) Kementsietsidis.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Object storage and object interoperability
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Virtual Local Area Networks In Security By Mark Reed.
The Biologically Inspired Distributed File System: An Emergent Thinker Instantiation Presented by Dr. Ying Lu.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Introduction to Databases
Web Development Web Servers.
Physical Data Model – step-by-step instructions and template
Software Engineering Architectural Design Chapter 6 Dr.Doaa Sami
MVC and other n-tier Architectures
Software Design and Architecture
CHAPTER 3 Architectures for Distributed Systems
Introduction to Computers
The Top 10 Reasons Why Federated Can’t Succeed
Introduction to Databases
CPE 401 / 601 Computer Network Systems
Introduction to Databases
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
The ANSI/SPARC Architecture aka the 3 Level Architecture
Chapter 2 Database Environment.
Data Base System Lecture : Database Environment
Mobile Computing.
Peer to Peer Information Retrieval
Introduction to Databases Transparencies
Lecture 1: Multi-tier Architecture Overview
Cloud computing mechanisms
Architecture Competency Group
Analysis models and design models
Introduction to Databases
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Internet Protocols IP: Internet Protocol
Planning and Storyboarding a Web Site
Unexpected Peer-to-Peer
TG1 and System Design Document
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Computer Networks Protocols
Presentation transcript:

Making Peer Databases Interact – A Vision for an Architecture Supporting Data Coordination Working Group (in alph. order): Bernstein Phil (4) Kementsietsidis Tasos (2) Kuper Gabriel (1) Mylopoulos John (2) Serafini Luciano (3) Shvaiko Pavel (1) Zaihrayeu Ilya (1) Sites: University of Trento University of Toronto ITC-Irst, Trento Microsoft Research Fausto Giunchiglia (1) Madrid, 20 September 2002

The Talk Peer-to-Peer Databases – The intuition Preliminary Logical Architecture The Running Example Conclusion … and Agents???

PEER-TO-PEER DATABASES – THE INTUITION

The Peer-to-Peer (P2P) “Peer-to-peer is a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must … have significant or total autonomy of central servers” Quote from Clay Shirkey (www.shirky.com)

Examples of P2P Computing Napster – a shared directory of available music and client software which allows, for instance, to import and export files Gnutella – a decentralized group membership and search protocol, mainly used for file sharing Groove – a system which implements a secure shared space among peers JXTA – which aims at creating a common platform that makes it simple and easy to build a wide range of distributed services and applications in which every device is addressable by a peer Is there a place for databases?

Motivating Example: Databases of Medical Patients One patient may be described in several databases: pharmacist, family doctor and hospital But the databases can use different patient ID formats, disease descriptions, etc Nevertheless they still may need to interoperate At this point data integration may suffice, if the patient goes to the same doctor, pharmacist and hospital When a patient is injured on a ski holiday in another country, yet more databases need to get involved Complete integration is likely to be infeasible But dynamic integration of databases relevant to one patient could have high value  

Data (base) Coordination “... Coordination is managing dependencies between interacting databases” Why is it different from data (base) Integration? No statically maintained global schema many of the parameters (metadata) influencing the interaction among peer databases are decided at run time, whereas Integration is made in design time Change in content of a node does not affect the overall system performance … and For any given query, nodes coordinate in order to define and use the most “appropriate” (virtual) schema – this is crucial for dealing with the strong dynamics of a P2P network

The Three Variances Data integration mechanisms for randomly acquainted databases become impractical We have three kinds of unpredictable run time factors, which influence the answer to a given query in a P2P network: Network (dependent) variance: the network changes over time Database (dependent) variance: different databases, if asked the same global query will provide different answers Query (dependent) variance: different queries, even if posed to the same database, will impose different points of view on the network

Good Enough Answers In data coordination, it becomes hard to maintain a high quality level in the answers provided bythe P2P network High quality data can flow among the databases preserving (at the best possible level of approximation) soundness and completeness Good Enough Answer (intuition) – high quality level answer which serves its purposes given the amount of effort made in computing it

Example of a Good Enough Answer When planning his vacation in Trentino, John goes to a local travel agency (TA) TA unluckily can not offer John anything from their own database Instead TA searches for single operators in the Trentino region (hotels, ski resorts, etc) TA starts communication sessions with some operators TA queries for the necessary information (e.g., prices, conditions, availability) As long as, for instance, TA gets a hotel John likes, this is Good Enough Compared to the Motivating Example, much lower quality data coordination will probably suffice Cost: 150 $ Avail: 05/01/03 – 15/01/03 Services: …

Tuning Coordination Over Time A lot of metadata needs to be produced and maintained Due to the strong dynamics of a P2P network, this is a crucial and hard task to perform because: A node will never know the full list of its peers A node will never know everything about its peers Its knowledge will be hard to maintain and will easily become obsolete There is a need of tuning/improving, on each peer, the quality of the interaction (for instance, with the help of learning algorithms, metadata editors, and so on) There is an obvious trade-off between the quality of the answers and the effort made in maintaining coordination

VERY PRELIMINARY HINTS OF A LOGICAL ARCHITECTURE

A Proposed Architecture Four basic ingredients: Interest Groups Acquaintances Coordination Rules Correspondence Rules

Interest Groups Peer nodes know very little of the other nodes of the P2P network, and about the topics (e.g., Tourism, Medical care, …) their peers are able to answer queries An Interest Group is a set of nodes which are able to answer queries about a certain topic There is a Group Manager (GM) which is in charge of the management of the metadata needed in order to run the group The main goal of GM is to compute the Query Scope (QS) – the set of nodes a query should be propagated to

Acquaintances Acquaintances are nodes that a node knows about and that have data relevant to answer specific queries A node is an acquaintance of another node only with respect to (possibly, a schematic representation of) a query There must be a way to compute how to propagate a query, to propagate results back, and to reconcile them with the results coming from the other acquaintances

Coordination Rules Each acquaintance may be associated with one or more Coordination Rules coordination rules specify under what conditions, when, how and where to propagate queries or updates A proposed implementation of coordination rules is as Event-Condition-Action (ECA) rules Event can be an update or a query coming from the user or from another node Condition refers to properties of the update or query (e.g., the type of query and/or which data are referenced by the query) Action can be the translation and propagation of a given update or query to a particular acquaintance

Correspondence Rules Each acquaintance is associated with one or more Correspondence Rules Correspondence Rules translate queries and query results (semantic heterogeneity) Implemented as rewrite rules and are called by coordination rules, in action and condition components They can be used, for instance to translate attribute or element names (Domain Relations)

Level One Architecture P2P Layer P2P functionality’s add-on Local Data Source Database File system Web site … User Interface User queries Results Query Manager and Update Manager Responsible for query and update propagation Manage coordination and correspondence rules, acquaintances, and interest groups Wrapper provides a translation layer between QM and UM, and LDS

A Proposed Strategy for Query Propagation “no more propagation from 8” “no more propagation from 9” 5. “nodes 2 and 4 are reached” “node 6 is reached” “node 8 is reached” User submits query Q () Node defines query topic Node sends to Group Manager (GM) request to define Query Scope (QS) GM computes and sends back QS Node 1 sends query to acquaintances in QS, and reports this fact to GM Nodes 2 and 4 send answer to node 1 Nodes propagate the query to theirs acquaintances from QS and report this fact to GM And so on… Nodes which do not propagate any further, report this fact to GM Propagation stops when “no more propagation” received from all boundary nodes 3. QS (, topic) = ? GM 4. QS (, topic)= (2, 4, 6, 8, 9, 11) 9 6 2 2. Q (, topic) ←Res2 7 10 1. Q () ←Res4 1 4 11 3 5 8

THE RUNNING EXAMPLE

“Toy” Databases Recall Motivating Example: Family Doctor DB F: Prescription (PatID, P_Name, Illness_Desc, StartDate, RecoveryDate, Treatment, Type, Prescriptions); Hospital DB H: Patients (PID, Name, Disease, Treatment_Desc, In, Out); Medical Office DB M: Accidents (P_id, FN, LN, Address_Reason, Treatment_Taken, Prescription_Given, Date) John, who suffers the accident, is described in H with ID “P12”, in F as “8”, and, when addressed to M, he is assigned ID “A13”

T = “Medical Care in Canada” Query Example Lets suppose QM is asked to M: Select FN, LN, Address_Reason, Treatment_Taken, Prescription_Given, Date From “M:Accidents” Where Address_Reason Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘A13’ With the indication QM is a global query with topic T = “Medical Care in Canada” After some search T is matched with the topic “Medical Care in Toronto” of the interest group G

Group G H is acquainted with F and P is acquainted with F; dashed lines are group metadata channels; H is GM of G GM computes query scope QS = G = {F, H, P} for query QM M gets acquainted with H M: Accidents and H: Patients are matched As the result a set of Coordination Rules is generated

Examples of Coordination Rules Event: M:Q Condition: Q:(Address_Reason  Select OR Treatment_Taken  Select) AND (PID = ‘A13’  Where) Action: Q = Apply (Q, Corr_Rules_Query) Send (Q, H) Coor # 2 Event: M:RH Condition: None Action: RM = Apply (RH, Corr_Rules_Results) Where Corr_Rules_Query and Corr_Rules_Results are correspondence rules which translate outgoing query and incoming results

Query Propagation P is not reachable because there is no acquaintance graph from M to P In the graph the following queries are circulating: QH = Select Name, Disease, Treatment_Desc From “H:Patients” Where Disease Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘P12’ QF = Select P_Name, Illness_Desc, Treatment From “F:Prescriptions” Where Illness_Desc Like (‘%Fracture%’ Or ‘%Dislocation%’) And PID = ‘8’

Results Propagation and Reconciliation H and F generate the following results: ResH = <’John’, ‘Forearm dislocation’, ‘Bandage’> ResF = <’John’, ‘Leg fracture’, ‘Leg put in plaster’> When reached M, the results are reconciled as follows: ResM =

Variance and Good Enough Answers ResM is incomplete, some fields from H: Patients and F: Prescription are missing Nevertheless the results are good enough because they still serve the needs of M Network Variance If F is down, the results are even more incomplete Database Variance If M gets acquainted with F instead of H – only ResF is retrievable. F has a different “vision” of the world, as it is not acquainted with H Query Variance If in QM ID of John is substituted by ID of another, not shared patient, then no Coordination Rules and therefore no propagation

Conclusion First investigation of how to make databases interact in a P2P network. There are four main dimensions: We must integrate data coming from autonomous, most often semantically heterogeneous, databases; We must deal with network, database, and query variance. This is why we talk of data coordination, as distinct from data integration; We will almost never get correct and complete answers. We must be content with answers which are good enough; There is a need to tune metadata. This is requires in order to cope with the dynamics of a P2P network.

References Project website: http://www.dit.unitn.it/~p2p/ “Data Management for Peer-to-Peer Computing: A Vision”, WebDB 2002, P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu L. Serafini, F. Giunchiglia, J. Mylopoulos and P. Bernstein “The Local Relational Model: Model and Proof Theory”, tech. rep. IRST, Trento