Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004.

Slides:



Advertisements
Similar presentations
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Advertisements

Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
O RCHESTRA : Rapid, Collaborative Sharing of Dynamic Data Zachary Ives, Nitin Khandelwal, Aneesh Kapur, University of Pennsylvania Murat Cakir, Drexel.
Chapter 13 (Web): Distributed Databases
Naming Computer Engineering Department Distributed Systems Course Asst. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2014.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Mohamed Hafeeda, Ahsan Habib et al. Presented By: Abhishek Gupta.
Managing Data Resources
Introduction to Active Directory
Overview Distributed vs. decentralized Why distributed databases
Object Naming & Content based Object Search 2/3/2003.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Emerging Research Dimensions in IT Security Dr. Salar H. Naqvi Senior Member IEEE Research Fellow, CoreGRID Network of Excellence European.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
By Karan Oberoi.  A directory service (DS) is a software application- or a set of applications - that stores and organizes information about a computer.
Distributed Databases
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
On P2P Collaboration Infrastructures Manfred Hauswirth, Ivana Podnar, Stefan Decker Infrastructure for Collaborative Enterprise, th IEEE International.
Database Design – Lecture 16
JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan, France Workshop.
Piazza: Data Management Infrastructure for the Semantic Web Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing February.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Querying Structured Text in an XML Database By Xuemei Luo.
Peer-to-Peer Distributed Shared Memory? Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan/Bretagne France Dagstuhl seminar, October 2003.
Session-8 Data Management for Decision Support
Module 7 Active Directory and Account Management.
March 12, 2008© Copyright 2008 John Buford SAM Overlay Protocol draft-buford-irtf-sam-overlay-protocol-01.txt John Buford, Avaya Labs Research IETF 71.
Distributed Database Systems Overview
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
CERN – European Organization for Nuclear Research Administrative Support - Internet Development Services CET and the quest for optimal implementation and.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Databases Illuminated
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Metadata Mòrag Burgon-Lyon University of Glasgow.
An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Peer to Peer Network Design Discovery and Routing algorithms
1/14/ :59 PM1/14/ :59 PM1/14/ :59 PM Research overview Koen Victor, 12/2007.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Introduction to Active Directory
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
An overlay for latency gradated multicasting Anwitaman Datta SCE, NTU Singapore Ion Stoica, Mike Franklin EECS, UC Berkeley
Managing Data Resources File Organization and databases for business information systems.
James A. Senn’s Information Technology, 3rd Edition
Chapter 1 Characterization of Distributed Systems
Building Distributed Educational Applications using P2P
Peer-to-Peer Data Management
The Top 10 Reasons Why Federated Can’t Succeed
Database Architecture
Dept. of Computer Science
Presentation transcript:

Introduction Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing January 13, 2004

2 Welcome!  To the initial version of the Penn Systems Seminar  First of an ongoing series, focusing on systems research topics of general interest  Format: reading and discussion (no homework or exams)  Independent Study encouraged to supplement the seminar  Our focus: P2P and distributed ad hoc systems

3 What Is the Vision of Peer-to-Peer Computing? Loose coupling, auto configuration:  No central administration  Scalability  Adaptability/resiliency  Nodes contribute as well as consume resources  System continues as peers join and leave

4 How Does P2P Work?  P2P infrastructure forms an overlay network over the real Internet, which supports:  Schemes for distributing resources (data, computation) without a directory structure  Unstructured: query by flooding or over advertisements  Structured: query according to an algorithm that organizes the peers into a consistent structure (hash table, tree, …)  Graceful handling of loss or gain of nodes  Replication “where appropriate”  Provides reliability/availability  Improves performance (self-tuning)  More on this later, from Honghui

5 The Promise of P2P  Major challenge for applications is generally scalability  Traditional systems definition:  Scalability of systems to numbers of requests, clients, etc.  But we need “human” scalability as well:  Avoid human administration, tuning, oversight, custom code  Self-administering; auto-tuning  Providing the “right” abstractions  Human contributors often create heterogeneity among components, data, participation levels, etc.  Aspects of P2P should help with all of these

6 The Central Questions: Goals of this Seminar 1.“What is the killer app for a P2P substrate?”  Is there more to this P2P idea than pirating music and searching for little green men (and women)?  What applications can benefit from P2P-like techniques?  What are their key properties? 2.What programming models are most appropriate for building such applications? 3.How can P2P techniques be improved to better support the applications we want to build?  Security, trust, reliability, consistency, …

7 Some P2P Applications  Early in the semester: examining apps built over P2P overlay networks  We’ll start with two projects here at Penn  We’d like to talk with you if you’re interested in working or collaborating on these projects!  BRIEF overviews of the issues – more detailed talks later in the semester  Later: P2P games  First: Orchestra – P2P meets data integration…

8 Key Problem: Coordinating Efforts between Collaborators  Today, to collaboratively edit structured data, we centralize  For many applications, this isn’t a good model, e.g.:  Bioinformatics groups have multiple standard schemas and warehouses for genomic information – each group wants to incorporate the info of the others, but have it in their format, with their own unique information preserved, and the ability to override info from elsewhere  Different neuroscientists have may data from measuring electrical activity in the same part of the brain – they may want to share common information but maintain their specific local information; each scientist wants the ability to control when their updates are propagated Work-in-progress with Nitin Khandelwal; other contributors: Murat Cakir, Charuta Joshi, Ivan Terziev

9 The Orchestra System: Infrastructure for Collaborative Data Sharing  Each participant is a logical peer, with some XML schema that is mapped to at least one other peer’s schema  Schemas’ contents are logically synchronized initially and then on demand Part 1 Part 2 Part 3 mappings between XML schemas mappings Translated updates from 3: + XML tree A’ - XML tree B’ Updates: + XML tree A - XML tree B Translated updates from 3: + XML tree A’’ - XML tree B’’ Schema 2 Schema 3Schema 1

10 Some Challenges in Orchestra  Mappings  How to express them  Using them to translate updates, queries  Inconsistency  How to represent conflicts  How to resolve them  Update propagation  Consistency with intermittent connectivity  Scaling  To many updates  To many queries Logical & semantics- level Implementation- level (P2P-based)

11 Mappings  Some peers may be replicas  Others need mappings, expressed as “views”  Views: functions from one schema to another  Can be inverted (may lose some information)  Can be “chained” when there is no direct connection  (Much research in generating these automatically [DDH00][MB01], …)  Prior work on propagating updates through relational views [BD82][K85][C+96]…  Ensuring the mapping specifies a deterministic, side-effect-free translation  Algorithmically applying the translation  Ongoing work with Nitin Khandelwal:  Extending the model to handle (unordered) XML  Challenge: dealing with XML’s nesting and its repercussions

12 A Globally Consistent Model that Encodes Conflicts  Even in the presence of conflicts, want a “global state” (from perspective of some schema) when we synchronize  Allows us to determine what’s agreed-upon, what’s conflicting  Can define conflict resolution strategies  Goal: “union of all states” with a way of specifying conflicts  Define conditional XML tree based on a subset of c-tables [IM84]  Each peer p i has a boolean flag P i representing “perspective i” root auth Smith Lee If P 1 If P 2

13 Propagating Updates with Intermittent Connectivity  How to synchronize among n peers (even assuming the same schema)?  Not all are connected simultaneously  Usual approaches:  Locking (doesn’t scale)  Epidemic algorithms (only eventually consistent)  Approach:  “Shadow instance” of the schema, replicated within the other peers of the network  Everyone syncs with the shadow instance  Benefits: state is deterministic after each sync

14 Scaling, Using P2P Techniques  Update synchronization  Key problem: find values conflicting with “shadow instance”  Partition the “shadow instance” across the network  Query execution  Partition computation across multiple peers (PIER does this)  Query optimization  Optimization breaks the query into sub-problems, uses dynamic programming to build up estimates of the costs of applying operators  Can recast as recursion + memoization  Use P2P overlay to distribute each recursive step  Memoize results at every node  Why is this useful? Suppose 2 peers ask the same query!

15 Current Status  Have a basic strategy for addressing many of the problems in collaborative data sharing  Initial sketches of the core algorithms  Need to develop them further  … And to implement (and validate) them in a real system!