On Improving the Performance Dependability of Unstructured P2P Systems via Replication ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial.

Slides:



Advertisements
Similar presentations
P2PR-tree: An R-tree-based Spatial Index for P2P Environments ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA University of Tokyo.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
Small-world Overlay P2P Network
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
1 Efficient Massive Sharing of Content among Peers by Peter Triantafillou, Chryssani Xiruhaki and Manolis Koubarakis Dept. of Electronics and Computer.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Efficient, Proximity-Aware Load Balancing for DHT-Based P2P Systems Yingwu Zhu, Yiming Hu Appeared on IEEE Trans. on Parallel and Distributed Systems,
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Object Naming & Content based Object Search 2/3/2003.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Searching in Unstructured Networks Joining Theory with P-P2P.
On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
Load Balancing in Structured P2P System Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, Ion Stoica IPTPS ’03 Kyungmin Cho 2003/05/20.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
1 EnviroStore: A Cooperative Storage System for Disconnected Operation in Sensor Networks Liqian Luo, Chengdu Huang, Tarek Abdelzaher John Stankovic INFOCOM.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
EFFECTIVE LOAD-BALANCING VIA MIGRATION AND REPLICATION IN SPATIAL GRIDS ANIRBAN MONDAL KAZUO GODA MASARU KITSUREGAWA INSTITUTE OF INDUSTRIAL SCIENCE UNIVERSITY.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
National Institute of Advanced Industrial Science and Technology Query Processing for Distributed RDF Databases Using a Three-dimensional Hash Index Akiyoshi.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Dynamic P2P Indexing and Search based on Compact Clustering Mauricio Marin Veronica Gil-Costa Cecilia Hernandez UNSL, Argentina Universidad de Chile Yahoo!
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Rate-Based Channel Assignment Algorithm for Multi-Channel Multi- Rate Wireless Mesh Networks Sok-Hyong Kim and Young-Joo Suh Department of Computer Science.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
Peer to Peer Network Design Discovery and Routing algorithms
Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Peer-to-Peer Video Systems: Storage Management CS587x Lecture Department of Computer Science Iowa State University.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Malugo – a scalable peer-to-peer storage system..
CMSC 691B Multi-Agent System A Scalable Architecture for Peer to Peer Agent by Naveen Srinivasan.
The Biologically Inspired Distributed File System: An Emergent Thinker Instantiation Presented by Dr. Ying Lu.
Anirban Mondal (IIS, University of Tokyo, JAPAN)
Peer-to-Peer Data Management
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
Controlling the Cost of Reliability in Peer-to-Peer Overlays
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
Peer-to-Peer Video Services
Presentation transcript:

On Improving the Performance Dependability of Unstructured P2P Systems via Replication ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial Science, University of Tokyo.

PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work

INTRODUCTION P2P systems are becoming increasingly popular A dependable P2P system is the need of the hour Two perspectives of dependability system reliability the availability of the individual peers system performance data availability We define a performance-dependable P2P system as one that the users can rely on for obtaining data files of their interest in real-time. We focus on improving the performance-dependability of unstructured P2P systems via dynamic replication.

Motivation Free-riders A majority of the peers typically download data from a small percentage of peers that offer data High skews in the initial data distribution A disproportionately high number of queries need to be answered by a few ‘hot’ peers Severe load imbalance throughout the system. Job queues of the ‘hot’ peers keep increasing Increased waiting times  high response times

Motivation Free-riders A majority of the peers typically download data from a small percentage of peers that offer data High skews in the initial data distribution A disproportionately high number of queries need to be answered by a few ‘hot’ peers Severe load imbalance throughout the system. Job queues of the ‘hot’ peers keep increasing Increased waiting times  high response times This decreases the dependability of the system.

The Challenges Sheer size of P2P networks Heterogeneity CPU capacity Available disk space Transfer rate of connections Dynamism of the environment Peers joining / leaving the system Peers joining / leaving the system Hot data becoming cold and vice versa Hot data becoming cold and vice versa

MAIN CONTRIBUTIONS A dynamic data placement strategy involving data replication Objective: to reduce the loads of the overloaded peers A dynamic query redirection technique Objective: to reduce response times

PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work

RELATED WORK Broadcast (Gnutella) Broadcast (Gnutella) Centralized (Napster) Centralized (Napster) Routing indices [Crespo2002] Routing indices [Crespo2002] Distributed hash tables Distributed hash tables Chord [Stoica2001] Chord [Stoica2001] Pastry [Rowstron2001] Pastry [Rowstron2001]

RELATED WORK (CONT.) [Kangasharju2002] investigates optimal replication of content in P2P systems adaptive, fully distributed algorithm that dynamically replicates content in a near-optimal manner [Cohen2002, Lv2002] facilitate search via replication. Dependability via load-balancing in structured P2P systems (using DHTs) [Dabek2001] [Rao2003] [Triantafillou2003] divides system into clusters based on semantic categories discusses dependability via inter-cluster and intra-cluster load-balancing

How this proposal differs from our previous spatial GRID proposal? Our GRID-related work Imposes structure on system Data movement in KB range Data scattering avoidance Individual nodes are usually dedicated and expected to be available most of the time. Main aim is load-balancing This proposal No structure imposed Data movement in MB/GB Data scattering is ok Individual nodes may join/leave anytime. Replication, not load- balancing

PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work

SYSTEM OVERVIEW Each peer is assigned a globally unique identifier PID Broadcast-based search Every peer maintains its own access statistics Number of accesses made to each of its data files. List of peers which has downloaded each of its files Given that very ‘hot’ files may be aggressively downloaded by hundreds of peers very quickly, a peer keeps track only of those peers which have directly downloaded from itself. Every peer provides a certain amount Space of its disk space for replication. LRU scheme deployed for Space Periodic deletion of unused replicas We sacrifice replica consistency for improving query response times.

SYSTEM OVERVIEW (CONT.) Distance between two peers: communication time between them Two peers are regarded as neighbours if they are directly connected to each other. Periodic exchange of status messages between neighbours Load information Available disk space information

SYSTEM OVERVIEW (CONT.) Load of a peer: number of queries waiting in peer’s job queue Load normalized w.r.t. CPU capacity Assumptions Peers know transfer rates between themselves and other peers. Every peer knows availability information of its neighbouring peers.

PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work

Replication Scheme Each peer P periodically checks its neighbours’ loads If P’s load exceeds the average loads of its neighbouring peers by 10%, replication is initiated. Selection of hot data files Using recent access statistics information P sorts its files in desc. order of access frequencies P traverses this sorted list of data files and selects as ‘hot’ files the top N files whose access frequency exceeds a pre-defined threshold Tfreq. Number of replicas For every Nd accesses to D, a new replica is created for D. Tfreq and Nd are pre-specified at design time.

Criteria for Selection of destination peer Dest for replication Dest should have a high probability of being online. PDest should have adequate available disk space. Load difference with Dest should be significant. Transfer time TRep with Dest should be minimized. Dest should be chosen from the peers which have already downloaded that data file. This makes TRep effectively equal to 0.

Replication Strategy For each ‘hot’ data file D, the ‘hot’ peer PHot sends a message to each peer which has downloaded D The peers in which a copy of D exists reply to PHot with their respective load and available disk space Only the peers with high availability and sufficient available disk space are candidates Among these candidate peers, PHot first puts the peer MIN with the lowest load into a set Candidate. Peers whose normalized load difference with MIN is less than δ are also put into Candidate. δ is a small integer The peer in Candidate whose available disk space is maximum is selected as the destination peer.

Algorithm for selecting the destination peer

Query Redirection to replicas What happens when a peer PIssue issues a query Q for a data item D to a ‘hot’ peer PHot? PHot needs to redirect Q to a peer REDIRECT containing Di’s replica, if any such replica exists. Objective: To minimize Q’s response time PHot checks the list of peers having Di’s replica Selection criteria for query redirection REDIRECT should be highly available. Load difference between PHot and REDIRECT should be significant. Transfer time between REDIRECT and PIssue should be low.

Query Redirection (Cont.) The ‘hot’ peer PHot first selects a set of peers which contain a replica of the data file D whose load difference with itself exceeds TDiff. TDiff is a parameter which is application- dependent and subjective. Among these selected peers, the peer with the maximum transfer rate with the query issuing peer PIssue is selected for query redirection.

Query redirection algorithm

PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work

Performance Evaluation Investigates the following Investigates the following Effect of variations in workload skew Effect of variations in workload skew Effect of variations in number of peers Effect of variations in number of peers Performance metric: Performance metric: Average Response Time Average Response Time

PARAMETERS USED IN PERFORMANCE EVALUATION

Average Response times at the hot nodes

Snapshot of Load distribution

Effect of varying the Workload Skew

Effect of varying the number of peers

PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work

CONCLUSION AND FUTURE WORK We have proposed a strategy for enhancing the dependability of P2P systems via dynamic replication. Our strategy takes free-riders into account. Our performance evaluation demonstrates the effectiveness of our replication-based strategy. Future Scope of Work Dealing with very large data items e.g., video files Cost-effective integration into existing P2P systems Load-balancing Load-balancing