Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004.

Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004

2 Welcome!  To the initial version of the Penn Systems Seminar  First of an ongoing series, focusing on systems research topics of general interest  Format: reading and discussion (no homework or exams)  Independent Study encouraged to supplement the seminar  Our focus: P2P and distributed ad hoc systems

3 What Is the Vision of Peer-to-Peer Computing? Loose coupling, auto configuration:  No central administration  Scalability  Adaptability/resiliency  Nodes contribute as well as consume resources  System continues as peers join and leave

4 How Does P2P Work?  P2P infrastructure forms an overlay network over the real Internet, which supports:  Schemes for distributing resources (data, computation) without a directory structure  Unstructured: query by flooding or over advertisements  Structured: query according to an algorithm that organizes the peers into a consistent structure (hash table, tree, …)  Graceful handling of loss or gain of nodes  Replication “where appropriate”  Provides reliability/availability  Improves performance (self-tuning)  More on this later, from Honghui

5 The Promise of P2P  Major challenge for applications is generally scalability  Traditional systems definition:  Scalability of systems to numbers of requests, clients, etc.  But we need “human” scalability as well:  Avoid human administration, tuning, oversight, custom code  Self-administering; auto-tuning  Providing the “right” abstractions  Human contributors often create heterogeneity among components, data, participation levels, etc.  Aspects of P2P should help with all of these

6 The Central Questions: Goals of this Seminar 1.“What is the killer app for a P2P substrate?”  Is there more to this P2P idea than pirating music and searching for little green men (and women)?  What applications can benefit from P2P-like techniques?  What are their key properties? 2.What programming models are most appropriate for building such applications? 3.How can P2P techniques be improved to better support the applications we want to build?  Security, trust, reliability, consistency, …

7 Some P2P Applications  Early in the semester: examining apps built over P2P overlay networks  We’ll start with two projects here at Penn  We’d like to talk with you if you’re interested in working or collaborating on these projects!  BRIEF overviews of the issues – more detailed talks later in the semester  Later: P2P games  First: Orchestra – P2P meets data integration…

8 Key Problem: Coordinating Efforts between Collaborators  Today, to collaboratively edit structured data, we centralize  For many applications, this isn’t a good model, e.g.:  Bioinformatics groups have multiple standard schemas and warehouses for genomic information – each group wants to incorporate the info of the others, but have it in their format, with their own unique information preserved, and the ability to override info from elsewhere  Different neuroscientists have may data from measuring electrical activity in the same part of the brain – they may want to share common information but maintain their specific local information; each scientist wants the ability to control when their updates are propagated Work-in-progress with Nitin Khandelwal; other contributors: Murat Cakir, Charuta Joshi, Ivan Terziev

9 The Orchestra System: Infrastructure for Collaborative Data Sharing  Each participant is a logical peer, with some XML schema that is mapped to at least one other peer’s schema  Schemas’ contents are logically synchronized initially and then on demand Part 1 Part 2 Part 3 mappings between XML schemas mappings Translated updates from 3: + XML tree A’ - XML tree B’ Updates: + XML tree A - XML tree B Translated updates from 3: + XML tree A’’ - XML tree B’’ Schema 2 Schema 3Schema 1

10 Some Challenges in Orchestra  Mappings  How to express them  Using them to translate updates, queries  Inconsistency  How to represent conflicts  How to resolve them  Update propagation  Consistency with intermittent connectivity  Scaling  To many updates  To many queries Logical & semantics- level Implementation- level (P2P-based)

11 Mappings  Some peers may be replicas  Others need mappings, expressed as “views”  Views: functions from one schema to another  Can be inverted (may lose some information)  Can be “chained” when there is no direct connection  (Much research in generating these automatically [DDH00][MB01], …)  Prior work on propagating updates through relational views [BD82][K85][C+96]…  Ensuring the mapping specifies a deterministic, side-effect-free translation  Algorithmically applying the translation  Ongoing work with Nitin Khandelwal:  Extending the model to handle (unordered) XML  Challenge: dealing with XML’s nesting and its repercussions

12 A Globally Consistent Model that Encodes Conflicts  Even in the presence of conflicts, want a “global state” (from perspective of some schema) when we synchronize  Allows us to determine what’s agreed-upon, what’s conflicting  Can define conflict resolution strategies  Goal: “union of all states” with a way of specifying conflicts  Define conditional XML tree based on a subset of c-tables [IM84]  Each peer p i has a boolean flag P i representing “perspective i” root auth Smith Lee If P 1 If P 2

13 Propagating Updates with Intermittent Connectivity  How to synchronize among n peers (even assuming the same schema)?  Not all are connected simultaneously  Usual approaches:  Locking (doesn’t scale)  Epidemic algorithms (only eventually consistent)  Approach:  “Shadow instance” of the schema, replicated within the other peers of the network  Everyone syncs with the shadow instance  Benefits: state is deterministic after each sync

14 Scaling, Using P2P Techniques  Update synchronization  Key problem: find values conflicting with “shadow instance”  Partition the “shadow instance” across the network  Query execution  Partition computation across multiple peers (PIER does this)  Query optimization  Optimization breaks the query into sub-problems, uses dynamic programming to build up estimates of the costs of applying operators  Can recast as recursion + memoization  Use P2P overlay to distribute each recursive step  Memoize results at every node  Why is this useful? Suppose 2 peers ask the same query!

15 Current Status  Have a basic strategy for addressing many of the problems in collaborative data sharing  Initial sketches of the core algorithms  Need to develop them further  … And to implement (and validate) them in a real system!

Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004.

Similar presentations

Presentation on theme: "Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004.

Similar presentations

Presentation on theme: "Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004."— Presentation transcript:

Similar presentations

About project

Feedback