Peer-to-Peer Data Management

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Node Lookup in Peer-to-Peer Network P2P: Large connection of computers, without central control where typically each node has some information of interest.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
P2P Course, Structured systems 1 Introduction (26/10/05)
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Introduction Widespread unstructured P2P network
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Node Lookup in P2P Networks. Node lookup in p2p networks In a p2p network, each node may provide some kind of service for other nodes and also will ask.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Lecture 2 Distributed Hash Table
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
CSE 486/586 Distributed Systems Distributed Hash Tables
Distributed Hash Tables
Improving and Generalizing Chord
CHAPTER 3 Architectures for Distributed Systems
PROGRAM STUDI TEKNIK INFORMATIKA FAKULTAS ILMU KOMPUTER
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
A Scalable content-addressable network
Prof. Leonardo Mostarda University of Camerino
Peer-to-Peer (P2P) File Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
P2P Systems and Distributed Hash Tables
Peer-to-Peer Storage Systems
Distributed Hash Tables
A Semantic Peer-to-Peer Overlay for Web Services Discovery
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
P2P: Distributed Hash Tables
CSE 486/586 Distributed Systems Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
#02 Peer to Peer Networking
Presentation transcript:

Peer-to-Peer Data Management COMP3211 Advanced Databases Dr Nicholas Gibbins – nmg@ecs.soton.ac.uk 2014-2015

Peer-to-Peer Characteristics Autonomy Query Expressiveness Efficiency Quality of Service Fault-Tolerance Security

Peer-to-Peer Characteristics Autonomy Query Expressiveness Efficiency Quality of Service Fault-Tolerance Security Peers may join or leave at any time without restriction Peers control the data they store Peers control which other peers store their data

Peer-to-Peer Characteristics Autonomy Query Expressiveness Efficiency Quality of Service Fault-Tolerance Security User may describe desired data at appropriate level of detail Key lookup Keyword search with ranking Structured queries

Peer-to-Peer Characteristics Autonomy Query Expressiveness Efficiency Quality of Service Fault-Tolerance Security Efficient use of resources Lower cost Higher throughput

Peer-to-Peer Characteristics Autonomy Query Expressiveness Efficiency Quality of Service Fault-Tolerance Security User-perceived: Completeness of results Data consistency Data availability Query response time

Peer-to-Peer Characteristics Autonomy Query Expressiveness Efficiency Quality of Service Fault-Tolerance Security Efficiency and quality maintained despite peer failures Requires replication

Peer-to-Peer Characteristics Autonomy Query Expressiveness Efficiency Quality of Service Fault-Tolerance Security Cannot rely on trusted servers! Access controls

Reference Peer-to-Peer Architecture Query and Management Interface Query Manager Network Interface Peer Local Data Peer

Data Management Issues Data Location: peers must be able to refer to and locate data stored by other peers Query Processing: given a query, discover peers that contribute relevant data and efficiently execute the query Data Integration: access data despite heterogeneous schemas Data Consistency: maintain consistency on duplicate data when replicating or caching

Overlay Networks Peers connect to each other via an overlay network Usually has different topology to underlying physical network Concentrate on optimising communication over overlay network Two common distinctions: Pure versus hybrid Structured versus unstructured

Pure versus Hybrid Networks Pure Overlay Network No differentiation between peers – all are considered equal Hybrid Overlay Network Some peers are given special tasks to perform Also referred to as super-peer systems

Structured versus Unstructured Networks Unstructured Overlay Network Peers may communicate directly with any of their neighbours Peers may join network by attaching to any other peer Structured Overlay Network Tightly controlled topology and message routing Peers may only communicate with certain other peers Peers may only join the network in certain places

Unstructured Peer-to-Peer Networks Overlay network constructed in an ad hoc manner Data placement is unrelated to overlay topology Examples: Gnutella, Freenet, BitTorrent, Kazaa Fundamental issue of index management: Which peers hold which resources? Centralised versus distributed

Centralised Index Management: Napster Find resource R Who has resource R? Give me resource R Peer-23 has resource R Here is resource R Index server R Peer-23

Centralised Index Management P2P networks with centralised indexes are hybrid networks Specialised peer for handling index Not all peers are equal Centralised indexes are problematic Central point of failure (kill the index, kill the network) Bottleneck (limits throughput)

Distributed Index Management: Gnutella Find resource R Who has resource R? Who has resource R? Who has resource R? Here is resource R Who has resource R? Who has resource R? Flooding of messages R Who has resource R?

Distributed Index Management Search neighbour using flooding or gossiping Flooding Each peer sends the request to all of its neighbours Not scalable – heavy demands on network resources Time-to-live (maximum number of hops) on messages Gossiping Each peer knows all other peers in network Chooses one peer to pass on request Request eventually propagated to all nodes

Super-Peer Networks “Normal” peers index their content at super-peers Super-peers conduct neighbourhood search Single super-peer (original Napster, more or less)

Super-Peer Networks

Structured Peer-to-Peer Networks Address scalability shortcomings of unstructured networks Distributed Hash Tables are most popular approach Range of applications: Peer-to-peer file sharing Content distribution networks Distributed file systems

Distributed Hash Tables Provides a lookup service similar to a hash table put(key, resource) get(key) → resource Hash of resource key used to locate peer holding that resource Routing protocol used to locate target peer (requires that peers share routing information between themselves) Several approaches, based on different routing geometries: Ring, hypercube, tree

Chord: A ring geometry DHT P1 Peers and keys are given m-bit hash identifiers Peers and keys are then arranged into an identifier circle with at most 2m-1 nodes Keys are stored at the first peer whose identifier is greater than the key’s identifier P56 K54 K54 P8 55 7 P51 K10 K10 P48 m=6 id: 0..63 P14 47 15 P42 P21 39 23 K38 K38 P38 K24 31 P32 K30 K24 K30

Chord: A ring geometry DHT P1 Each peer has a successor and predecessor Successor chain used to find peer holding resource X.get(k) { peer = find-successor(k) resource = peer.lookup(k) return resource } X.find-successor(k) { if H(k) in (H(k), H(succ)] return succ else return succ.find-successor(k) } P56.succ P8.find-successor(K54) P8.get(K54) P56 P1.succ K54 P8 P51.succ 55 7 P51 P8.succ P48.succ K10 P48 m=6 id: 0..63 P14 47 P42.succ P14.succ 15 P42 P38.succ P21.succ P21 39 23 P32.succ K38 P38 31 P32 K24 K30

Chord: A ring geometry DHT P1 Following successors gives linear search Improve using a finger table: Contains up to m entries ith entry of peer n contains succ((n+2i-1) mod 2m) P56 K54 P8 55 7 P51 K10 P48 m=6 id: 0..63 P14 47 15 P8+1 P14 P8+2 P8+4 P8+8 P21 P8+16 P32 P8+32 P42 P42 P21 39 23 K38 P38 31 P32 K24 K30

Join Queries in DHTs Assume a binary join Q = R ⨝ S Tuples of R and S are distributed across peers in system HR, HS are the sets of peers storing tuples of R, S Hres is the set of peers storing the output relation Example algorithm: PIERjoin Three phases: Multicast, Hash, Probe/Join

PIERjoin Multicast Phase Hash Phase Probe/Join Multicast Q to all peers in HR, HS Hash Phase Each peer receiving Q scans its local tuples Any tuples that match the predicate are sent to peers in Hres using put() Probe/Join Each peer in Hres inserts the new tuples into its local tuples, probes the other members to find matching tuples and constructs the joined tuples Similar to Parallel Hash Join

Comparison of Peer-to-Peer System Requirement Unstructured Super-Peer Structured Autonomy Low Moderate Query Expressivity High Efficiency Quality of Service Fault-Tolerance Security