Download presentation
Presentation is loading. Please wait.
Published byMorgan Francis Modified over 9 years ago
1
On Improving the Performance Dependability of Unstructured P2P Systems via Replication ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial Science, University of Tokyo. E-mail: anirban@tkl.iis.u-tokyo.ac.jp
2
PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work
3
INTRODUCTION P2P systems are becoming increasingly popular A dependable P2P system is the need of the hour Two perspectives of dependability system reliability the availability of the individual peers system performance data availability We define a performance-dependable P2P system as one that the users can rely on for obtaining data files of their interest in real-time. We focus on improving the performance-dependability of unstructured P2P systems via dynamic replication.
4
Motivation Free-riders A majority of the peers typically download data from a small percentage of peers that offer data High skews in the initial data distribution A disproportionately high number of queries need to be answered by a few ‘hot’ peers Severe load imbalance throughout the system. Job queues of the ‘hot’ peers keep increasing Increased waiting times high response times
5
Motivation Free-riders A majority of the peers typically download data from a small percentage of peers that offer data High skews in the initial data distribution A disproportionately high number of queries need to be answered by a few ‘hot’ peers Severe load imbalance throughout the system. Job queues of the ‘hot’ peers keep increasing Increased waiting times high response times This decreases the dependability of the system.
6
The Challenges Sheer size of P2P networks Heterogeneity CPU capacity Available disk space Transfer rate of connections Dynamism of the environment Peers joining / leaving the system Peers joining / leaving the system Hot data becoming cold and vice versa Hot data becoming cold and vice versa
7
MAIN CONTRIBUTIONS A dynamic data placement strategy involving data replication Objective: to reduce the loads of the overloaded peers A dynamic query redirection technique Objective: to reduce response times
8
PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work
9
RELATED WORK Broadcast (Gnutella) Broadcast (Gnutella) Centralized (Napster) Centralized (Napster) Routing indices [Crespo2002] Routing indices [Crespo2002] Distributed hash tables Distributed hash tables Chord [Stoica2001] Chord [Stoica2001] Pastry [Rowstron2001] Pastry [Rowstron2001]
10
RELATED WORK (CONT.) [Kangasharju2002] investigates optimal replication of content in P2P systems adaptive, fully distributed algorithm that dynamically replicates content in a near-optimal manner [Cohen2002, Lv2002] facilitate search via replication. Dependability via load-balancing in structured P2P systems (using DHTs) [Dabek2001] [Rao2003] [Triantafillou2003] divides system into clusters based on semantic categories discusses dependability via inter-cluster and intra-cluster load-balancing
11
How this proposal differs from our previous spatial GRID proposal? Our GRID-related work Imposes structure on system Data movement in KB range Data scattering avoidance Individual nodes are usually dedicated and expected to be available most of the time. Main aim is load-balancing This proposal No structure imposed Data movement in MB/GB Data scattering is ok Individual nodes may join/leave anytime. Replication, not load- balancing
12
PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work
13
SYSTEM OVERVIEW Each peer is assigned a globally unique identifier PID Broadcast-based search Every peer maintains its own access statistics Number of accesses made to each of its data files. List of peers which has downloaded each of its files Given that very ‘hot’ files may be aggressively downloaded by hundreds of peers very quickly, a peer keeps track only of those peers which have directly downloaded from itself. Every peer provides a certain amount Space of its disk space for replication. LRU scheme deployed for Space Periodic deletion of unused replicas We sacrifice replica consistency for improving query response times.
14
SYSTEM OVERVIEW (CONT.) Distance between two peers: communication time between them Two peers are regarded as neighbours if they are directly connected to each other. Periodic exchange of status messages between neighbours Load information Available disk space information
15
SYSTEM OVERVIEW (CONT.) Load of a peer: number of queries waiting in peer’s job queue Load normalized w.r.t. CPU capacity Assumptions Peers know transfer rates between themselves and other peers. Every peer knows availability information of its neighbouring peers.
16
PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work
17
Replication Scheme Each peer P periodically checks its neighbours’ loads If P’s load exceeds the average loads of its neighbouring peers by 10%, replication is initiated. Selection of hot data files Using recent access statistics information P sorts its files in desc. order of access frequencies P traverses this sorted list of data files and selects as ‘hot’ files the top N files whose access frequency exceeds a pre-defined threshold Tfreq. Number of replicas For every Nd accesses to D, a new replica is created for D. Tfreq and Nd are pre-specified at design time.
18
Criteria for Selection of destination peer Dest for replication Dest should have a high probability of being online. PDest should have adequate available disk space. Load difference with Dest should be significant. Transfer time TRep with Dest should be minimized. Dest should be chosen from the peers which have already downloaded that data file. This makes TRep effectively equal to 0.
19
Replication Strategy For each ‘hot’ data file D, the ‘hot’ peer PHot sends a message to each peer which has downloaded D The peers in which a copy of D exists reply to PHot with their respective load and available disk space Only the peers with high availability and sufficient available disk space are candidates Among these candidate peers, PHot first puts the peer MIN with the lowest load into a set Candidate. Peers whose normalized load difference with MIN is less than δ are also put into Candidate. δ is a small integer The peer in Candidate whose available disk space is maximum is selected as the destination peer.
20
Algorithm for selecting the destination peer
21
Query Redirection to replicas What happens when a peer PIssue issues a query Q for a data item D to a ‘hot’ peer PHot? PHot needs to redirect Q to a peer REDIRECT containing Di’s replica, if any such replica exists. Objective: To minimize Q’s response time PHot checks the list of peers having Di’s replica Selection criteria for query redirection REDIRECT should be highly available. Load difference between PHot and REDIRECT should be significant. Transfer time between REDIRECT and PIssue should be low.
22
Query Redirection (Cont.) The ‘hot’ peer PHot first selects a set of peers which contain a replica of the data file D whose load difference with itself exceeds TDiff. TDiff is a parameter which is application- dependent and subjective. Among these selected peers, the peer with the maximum transfer rate with the query issuing peer PIssue is selected for query redirection.
23
Query redirection algorithm
24
PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work
25
Performance Evaluation Investigates the following Investigates the following Effect of variations in workload skew Effect of variations in workload skew Effect of variations in number of peers Effect of variations in number of peers Performance metric: Performance metric: Average Response Time Average Response Time
26
PARAMETERS USED IN PERFORMANCE EVALUATION
27
Average Response times at the hot nodes
28
Snapshot of Load distribution
30
Effect of varying the Workload Skew
31
Effect of varying the number of peers
32
PRESENTATION OUTLINE Introduction Introduction Related Work Related Work System Overview System Overview Proposed Replication Scheme Proposed Replication Scheme Performance Evaluation Performance Evaluation Conclusion and Future Work Conclusion and Future Work
33
CONCLUSION AND FUTURE WORK We have proposed a strategy for enhancing the dependability of P2P systems via dynamic replication. Our strategy takes free-riders into account. Our performance evaluation demonstrates the effectiveness of our replication-based strategy. Future Scope of Work Dealing with very large data items e.g., video files Cost-effective integration into existing P2P systems Load-balancing Load-balancing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.