Download presentation
Presentation is loading. Please wait.
Published byDonald Newton Modified over 9 years ago
1
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess
2
Goal - Overview –Determine if the KaZaA search space is queried in such a way that a group of 25,000 clients can satisfy most of their own requests. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
3
Goals - Details –Capture an extensive trace –Utilize that trace to understand file- sharing traffic flows –Model user and object activity –Determine inefficiencies in the distribution model –Propose solutions to inefficiencies Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
4
Motivations –Beginning in 1999-2000 file-sharing traffic began to exceed HTTP traffic in terms of aggregate bandwidth consumed –File-sharing traffic is much less understood than HTTP traffic even though it represents such a large segment of bandwidth usage –Bandwidth is expensive Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload HTTP Traffic P2P Traffic 20002002
5
The Trace –2 Machines –203 days 5 hours and 6 minutes –22.7TB of KaZaA file transfer traffic –Captured seasonal variations End of spring Summer Fall semester Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
6
Trace Conclusions –Users are patient 30 minutes to retrieve a small object Up to 1 week to retrieve a large object –Users consume less as they age Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
7
Trace Conclusions –Users machines are not very active A session is an unbroken length of time where a client has one or more file transfers in progress. Average sessions are only 2 minutes –90 th percentile 28 minutes Over the life of a client, it is only active 5.54% of the time or 0.20% of the trace period –90 th percentile clients are active most of their life, and 4.15% of the trace –Without control traffic analysis, is this meaningful? Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
8
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Transfer A Transfer B Transfer DTransfer E 3 Minutes2 Minutes Session Lengths
9
Trace Conclusions – Objects –Most requests are for small objects – 91% –Most bytes transferred are part of large objects – 65% –There are many small objects –There are few large objects –Small Objects’ popularity is subject to heavy churn No small object was in the top 10 for all 6 months Only 1 large object lived in the top 10 for 6 months 44 large files remained in the top 100 for 6 months The most popular small objects are new objects –Most requests are for old objects Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
10
Fetch-at-most-once –Once a KaZaA user obtains an object, they will not need to retrieve another copy 94% of Objects are fetched once per user 99% are fetched less than twice per user –Stems from the fact that media files are immutable and never ‘stale’ You may refresh ‘slashdot.org’ three times a day, but there is no point download ‘thriller.mpeg’ seventeen times. –This keeps KaZaA workload from following a Zipf curve even though object popularity does. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
11
Workload Modeling –Create a set of objects and give them popularity based on a zipf distribution –Create a set of clients that requests objects in proportion to there popularity –Have each client ‘fetch-at-most-once’ –Measure the distribution of transfers Does it follow a zipf curve How many big-object requests can a population of size N satisfy Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
12
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Popular objects are not requested as curve would predict
13
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Would a proxy cache help? –At first the proxy will cache the popular objects and succeed. –But as ‘fetch-at-most-once’ draws clients away from the Zipf curve and the proxy begins to fail. –What happens if we increase density of popularity? Curve starts higher and falls faster
14
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Previous model did not insert new objects. –New popular objects tend to ‘correct’ the work load. –Through providing locality New clients however do not help, they contribute to keeping old object’s popular and destroy locality
15
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Validating The Model –Capture parameters that are inputs to the model from the trace Number of clients Number of objects User request rate Probability user requests given file - Guess Probability of popularity of new objects - Guess Object arrival rates – Guess –Run simulation with harvested parameters –See if simulation predicts what actually happened
16
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Simulation seems to successfully predict reality. But with three free variables used to tune results, is this fair?
17
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload What inefficiencies can we eliminate? –Analysis against the trace shows 86% of object transfers were from external sources when an internal source possessed the object. A traditional proxy, given the resources, could cut bandwidth utilization by 86% –Would have to host pirated data Could use a proxy redirector instead. Must know the availability of the objects –Control traffic is obfuscated Build locality into the protocol –Does this sacrifice anonymity?
18
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload How successful would a locality aware protocol be? –Assume that a client is available for periods the trace shows it as active During a file transfer - extremely conservative
19
Questions? Will increasing efficiency decrease load as the authors would like? Or simply increase work achieved per dollar? Do clients have insatiable appetites? Are you worried that a large number of queries might have already been locally satisfied? Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.