AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources1 Fausto Giunchiglia, University of Trento Coordinating Peer-to-Peer information sources.

Slides:



Advertisements
Similar presentations
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Advertisements

Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.
C-OWL: contextualizing ontologies Fausto Giunchiglia October 22, 2003 Paolo Bouquet, Fausto Giunchiglia, Frank van Harmelen, Luciano Serafini, and Heiner.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Implementing Database Coordination in P2P Networks * Ilya Zaihrayeu SemPGRID-04, 18 May 2004, New York, USA * work with Fausto Giunchiglia.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Peer-to-Peer Databases David Andersen Advanced Databases.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
Chapter 4 The Relational Model.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
Reasoning with context in the Semantic Web … or contextualizing ontologies Fausto Giunchiglia July 23, 2004.
Lecture 8 Page 1 Advanced Network Security Review of Networking Basics: Internet Architecture, Routing, and Naming Advanced Network Security Peter Reiher.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Database Application Security Models Database Application Security Models 1.
Chapter 3 The Relational Model. 2 Chapter 3 - Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between.
Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
An Algebra for Composing Access Control Policies (2002) Author: PIERO BONATTI, SABRINA DE CAPITANI DI, PIERANGELA SAMARATI Presenter: Siqing Du Date:
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
CoFM: An Environment for Collaborative Feature Modeling Li Yi Institute of Software, School of EECS, Peking University Key Laboratory of High Confidence.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Chapter 1, Part II: Predicate Logic With Question/Answer Animations.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
LDK R Logics for Data and Knowledge Representation Modal Logic Originally by Alessandro Agostini and Fausto Giunchiglia Modified by Fausto Giunchiglia,
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Making Peer Databases Interact – A Vision for an Architecture Supporting Data Coordination Working Group (in alph. order): Bernstein Phil (4) Kementsietsidis.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
DISCRETE COMPUTATIONAL STRUCTURES CSE 2353 Fall 2010 Most slides modified from Discrete Mathematical Structures: Theory and Applications by D.S. Malik.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
LDK R Logics for Data and Knowledge Representation ClassL (Propositional Description Logic with Individuals) 1.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Logics for Data and Knowledge Representation ClassL (part 1): syntax and semantics.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Defects of UML Yang Yichuan. For the Presentation Something you know Instead of lots of new stuff. Cases Instead of Concepts. Methodology instead of the.
Chapter 2 Database System Concepts and Architecture
Transaction Management and Concurrency Control
Review for the Midterm Exam
Knowledge Representation
Database management concepts
Chapter 4 The Relational Model Pearson Education © 2009.
Logics for Data and Knowledge Representation
Implementing Mapping Composition
The Relational Model Transparencies
Working Group (in alph. order): Bernstein Phil (4)
Deniz Beser A Fundamental Tradeoff in Knowledge Representation and Reasoning Hector J. Levesque and Ronald J. Brachman.
Implementation of Learning Systems
Presentation transcript:

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources1 Fausto Giunchiglia, University of Trento Coordinating Peer-to-Peer information sources

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources2 The talk Intuitions The underlying theory: The Local Relational Model (an application of the Local Models Semantics [Ghidini and Giunchiglia, AIJ 2001]) Some theoretical results VERY PRELIMINARY logical architecture … and agents?

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources3 INTUITIONS

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources4 Peer to Peer (P2P) Computing Peers come and go, but must nevertheless be able to interoperate. There are many examples outside the database field  Napster – a shared directory of available music and client software to read/write the directory and import/export files.  Gnutella – a decentralized group membership and search protocol, mainly used for file sharing.  Groove – a secure shared space among intermit- tantly connected systems with no central server  …

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources5 Is There a Role for P2P Databases ? There’s hardly any literature  WebDB ’01 paper (Gribble, Halevy, Ives, Rodrig, Suciu) focuses on data placement  This implies some control over data placement  They’re serious about building a system (“Piazza”) Is it a really new research problem? Or only a new application with a lot of hype around it? Compare it with the work on data integration (Local-as- view, global-as-view approaches). Can’t we just apply the same techniques?

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources6 Data integration: a snapshot Global schema (defined at design time). Integration defined at design time by mapping local data bases into global data base Global schema as primitive (LAV: local-as-view), or local schemas as primitive (GAV: Global-as-view) In all cases: take one domain of interpretation (as implicitly defined by the global schema) and MAP all individuals, relations and attributes of databases to integrate into it Want correctness (query containment) But:  What if a new node comes in?  Can we really deal with completely autonomous nodes?  What about autonomy at run time (change schema?)  ….

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources7 Coordinating P2P databases: is it a new research problem? Domain Characteristics: Autonomy: peer databases are largely independent (in their language, contents, in how they answer queries …). They may be incomplete, overlapping, semantically heterogeneous, mutually inconsistent,.. Dinamicity: nodes come and go … and maybe come again …, schemas, attributes, values may change over time, … You know something about the peer databases. Almost never you know everything. This knowledge is hard to maintain and may be obsolete

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources8 Is it a new research problem? Solution desiderata: Need scalability over number of nodes Want “incrementality” as a function of the effort made in developing a solution (design time) and in getting “good” answers (runtime) (Design or run time) correctness and completeness should be limit cases (most of the time too costly to be implemented) Want robustness with respect to autonomy of peer databases

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources9 Is it a new research problem? Solution characteristics: Keep autonomy, add coordination, as much as it can be afforded (see incrementality) Notion of good enough answer, as a function of coordination effort NOTE: Coordination is NOT (data) integration.  Integration is defined once for all at design time. Coordination may change at run time  Differently from data integration, there is no global schema. By the way, what is a global schema in the P2P domain? How much are we willing to pay to approximate it … and maintain it in time? ……

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources10 The Local Relational Model

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources11 A Motivating Example – 1 - Scenario  Databases of medical patients  Complete integration is likely to be infeasible  But dynamic integration of databases relevant to one patient could have high value.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources12 A Motivating Example –2 - Consider 3 databases, one table per DB:  f: family doctor f:Prescription(PatID,treatment,disease)  p: pharmacist p:Medication(PID,Prod,PrescriptionID)  h: hospital h:Patients(PATid,disease,in,out) A given patient may be described in all 3 databases But the databases might use different patient id formats and disease descriptions. When a patient is injured on a ski holiday in another country, yet more databases need to get involved.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources13 Domain Relations Each database DBi: its language Li, with a set Ai of unary predicates for Attributes, a set of constant symbols DOMi for Elements, a set of predicates Ri for Relations Take a set of such DBi, i in I Define Domain relation rij as a subset of DOMi x DOMj. rij is the set of pairs where, intuitively di and dj (usually different constants) stand for the same object in the world Each row in domain relation rik specifies that value d 1 in database i corresponds to value d 2 in database k Clearly, it’s a simplification to have one domain per database. This is just for notational convenience.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources14 A Motivating Example – 3- Consider previous 3 databases,  f: family doctor f:Prescription(p12,Aspirin,Headache)  p: pharmacist p:Medication(31, Aspirin-Bayer,fd23)  h: hospital h:Patients(r3,car_accid,1/1/01,3/1/01) We may have:  in rhf  in rpf  in rfh, if we have inverse mapping

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources15 Domain relations … more …. Suppose we have:  in rhf  in rpf  in rfh  … NOTE: We do not collapse local domains in the universal domain (as in data integration). We keep them distinct, and introduce mappings between pairs of domains as objects. Domain relations explicitly manipulated at run time to implement coordination between peer databases.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources16 Domain Relations – Examples rij may be partial and not surjective (most often the case) rij, rji need not be symmetric: rij (rji(x))  x. For example, consider DB i containing length measurements in meters and DB j in kilometers. One can have  r ij (x) = roundToClosestK(x), e.g., r ij (653)=1, r ij (453)=0  r ji (x) = x*1000 e.g., r ji (1)=1000 rij= inverse(rji) : different but equivalent representations of same domain rij= rji = emptyset : disjoint domains (what if only one being emptyset?) rik=(rij composed rjk) : transitive mappings among domains rij(ds)= emptyset, with ds subset of di: keep ds secret  d1,d2 in DOMi,d1<I d2   d1’in rij(d1),  d2’in rij(d2). (d1’<j d2’): preserving order (currency exchange)

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources17 P2P Coordination Instead of a global schema, assume each peer has  pair-wise coordination fomulas that specify interdependencies.  binary domain relations that specify how the symbols used in one database translate to symbols used in another database. Coordination formulas and domain relations can only refer to acquaintances. Use domain relations and coordination formulas for query and update processing.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources18 Coordination Formulas – Examples (  p:x). (  p:y).(p: (  z).medication(x,z,y)  f: treatments(x, home, y) ) (  h:x).(  h:y).(h:(  z 1,z 2 ).patient(x,y,z 1,z 2 )  f: treatments(x, hospital, y) ) “There’s a row in the treatments table in the family doctor database for each row in the patient and hospital databases” NOTE: see indexing of formulas and variables

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources19 Coordination formulas Coordination formulas are built from atomic formulas i:  (x),where  (x) is a First Order formula, and using standard connectives: and, or, , , . Variables quantified on one DB may have to be interpreted on other DBs. Mapping is done exploiting domain relations. Consider, eg.:   (i:x).j:P(x) “for each object di in DOMi, the corresponding object dj =rij(di) in DOMj has the property P”   (i:x).(i:P(x)  j:Q(x) and k:R(x)) “for each object di in DOMi, if P holds of di … Quantification is always done with respect to the domain of one database. However notice difference between   (i:x).A(x), with A(x) a coordination formula  i:  x.B(x), with B(x) a first order formula. It holds iff  (i:x). i:B(x) holds

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources20 Higher Level Correspondences One can generalize the domain relation to correspondences at higher meta-levels  constant to constant, e.g., ‘one’  ‘uno’; or CAN$1.00  US$0.65  table to table, e.g., Cust  Customer  column to column, e.g., name(Cust)  nm(Customer) This is also captured in coordination formulas.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources21 Answering Queries Local queries. Treated as if there exist no peer databases. They are first order formulas of the form A(x)  q(x) with A(x) a first order formula, x and q as below Global queries. They are coordination formulas of the form A(x)  i: q(x)  where  A(x) is a coordination formula  x has n variables  q is a new n-ary predicate symbol  i is the database which gets the query  The answer to a global query is {d  dom i n such that (  i:x).A(x)  i:x=d)}

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources22 Answering Queries – An example Consider the query below, submitted to database h: ((i:P(x)  j:R(y))  k:S(x,y) )  h: q(x,y) Three steps:  Evaluate P,R,S in i,j,k (respectively)  map results via r ih,r jh,r kh to sets s i,s j,s k and then  compute ((s i  s j )  s k )

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources23 SOME THEORETICAL RESULTS

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources24 Theoretical Results – 1 - Provide a model theory by defining the Local Relational Model in terms of Relational spaces, where a relational space is defined as a pair: Provide a notion of satisfiability and logical consequence of coordination formulas with respect to relational frames Provide inference rules for using coordination formulas. Prove them sound and complete with respect to the LRM.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources25 Theoretical Results – 2 - Define a generalized relational theory as a theory with domain closure, distinct domain values, and finite number of possible relation extensions (closed world assumption). Define relational multi-context system as a family of relational languages (one per database) with a generalized relational theory (in T) and set of coordination formulas (in R). Prove that for any relational multi-context system, there’s a unique maximal relational space that satisfies it. (Generalizes Reiter’s result on CWA and single databases.)

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources26 Theoretical Results – 3 - Given a multi-context system that represents it, the answer to a query A(x)  i: q(x) is the set of all d such that {i:T i } i  I,R |- (  i:x).A(x)  i:x=d) This result is the basis for a correct and complete query answering mechanism (for a given set of coordination formulas … which may implement something totally different from the data integration approach (LAV, GAV)) 

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources27 VERY PRELIMINARY HINTS OF A LOGICAL ARCHITECTURE

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources28 A proposed architecture (prelim.) –1- Four basic ingredients  Interest Group: set of nodes being able to answer queries about a certain topic (e.g., Tourism, medical care). Needed to compute scope of query answering  Acquaintance (with respect to a node and a given query): a node which is supposed to have information that can be used to answer the query  Coordination rule (with respect to an acquaintance): it says how to propagate query forward and results back  Correspondence rule (with respect to an acquaintance): it takes care of semantic heterogeneity problem.

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources29 A proposed architecture (prelim.) –2- From theory to practice  Interest Group: In LRM is the set of databases in a relational frame  Acquaintance (of a node n1): In LRM any node n2 for which there is a coordination formula involving n1 and n2  Coordination rule: An implementation of coordination formulas, parametric on correspondence rules.  Correspondence rule : A set of rewrite rules which implement the language dependent part of coordination formulas and take care of semantic heterogeneity (domain relations are implemented as special kinds of correspondence rules).

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources30 Level 1 architecture – The P2P layer P2P Layer  P2P functionality is add-on Local Data Source  Database  File system  Web site  … User Interface  User queries  Results  … Query Manager and Update Manager  responsible for query and update propagation  manage coordination and correspondence rules, acquaintances, and interest groups Wrapper  provides a translation layer between QM and UM, and LDS

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources31 Level 2 architecture – The Query manager Propagation Planner  Talks to group-manager Query Formation  Responsible of formation of outgoing queries, as well as querying the local data source Results Handler  Responsible for sending and receiving query results;  Shows results to user Executed Query History  Preventing from duplicate query execution Acquaintances Interest Groups Group Management  Used only by node-managers for management of groups and query propagation Coordination and Correspondence Rules

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources32 Query propagation strategy 1.Node defines query topic 2.Node sends Group Manager (GM) request of Query Scope (QS) 3.GM computes QS 4.Node 1 sends query to acquaintances, in QS, namely 2 and 4, and reports this fact to GM. 5.Nodes 2 and 4 send answer to node 1 6.Node 2 propagates query to its acquaintances in QS, namely 4 and 6, and reports this fact to GM 7.And so on… 8.Nodes which do not propagate any further, report this fact to GM 9.Propagation stops when “no more propagation” received from all boundary nodes (reached all reachable acquaintances) Q (  ) 2. Q ( , topic) GM 4. QS ( , topic)= (2, 4, 6, 8, 9, 11) ←Res 2 ←Res 4

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources33 Summary Coordinating P2P information sources: keep autonomy, add (run-time) coordination. Be content with good enough answers. Theoretically, model coordination using four notions: set of local databases, domain relations, coordination formulas, global answer to a query Implementationally, implement coordination using five notions: interest groups, acquaintances, coordination rules, correspondence rules, coordination algorithm … and agents?

AOIS’02 - June 02, 2002 Coordinating Peer-to-Peer information sources34 Published work (not much … yet) Paper on LRM still unpublished, but see project Web page Paper on basic ideas in WEBDB 2002 Paper on architecture in CIA 2002 These slides soon on my Web page Project Web page (to be put up soon) will be accessible from my Web page: