Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess.

Slides:



Advertisements
Similar presentations
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.
Advertisements

Peer-to-Peer and Social Networks An overview of Gnutella.
Fundamentals of Python: From First Programs Through Data Structures
Predicting Tor Path Compromise by Exit Port IEEE WIDA 2009December 16, 2009 Kevin Bauer, Dirk Grunwald, and Douglas Sicker University of Colorado Client.
Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload Krishna Gummadi, Richard Dunn, Stefan Saroiu Steve Gribble, Hank Levy, John.
1 CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Measurement Studies Lecture 23 Reading: See links on website All Slides © IG.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Multimedia Proxy Caching Mechanism for Quality Adaptive Streaming Applications in the Internet Reza Rejaie Haobo Yu Mark Handley Deborah Estrin Presented.
Peer-to-peer Multimedia Streaming and Caching Service Jie WEI, Zhen MA May. 29.
End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Measurement, Modeling, and Analysis of a Peer-to-Peer File sharing Workload Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
A Hierarchical Characterization of a Live Streaming Media Workload E. Veloso, V. Almeida W. Meira, A. Bestavros, S. Jin Proceedings of Internet Measurement.
Hardware-based Load Generation for Testing Servers Lorenzo Orecchia Madhur Tulsiani CS 252 Spring 2006 Final Project Presentation May 1, 2006.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Peer-to-peer Multimedia Streaming and Caching Service by Won J. Jeon and Klara Nahrstedt University of Illinois at Urbana-Champaign, Urbana, USA.
Can Internet Video-on-Demand Be Profitable? SIGCOMM 2007 Cheng Huang (Microsoft Research), Jin Li (Microsoft Research), Keith W. Ross (Polytechnic University)
Lin Yingpei (Huawei Technologies) doc.: IEEE /1438r0 Submission November 2013 Slide 1 Traffic Observation and Study on Virtual Desktop Infrastructure.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
P2P Architecture Case Study: Gnutella Network
Can Internet VoD be Profitable? Cheng Huang (MSR), Jin Li (MSR), Keith W. Ross (NY Polytechnique)
An Analysis of Chaining Protocols for Video-on-Demand J.-F. Pâris University of Houston Thomas Schwarz, S. J. Universidad Católica del Uruguay.
Advanced Computer Networks1 Efficient Policies for Carrying Traffic Over Flow-Switched Networks Anja Feldmann, Jenifer Rexford, and Ramon Caceres Presenters:
Travis Portz.  Large, sudden increases in the traffic to a website  Low-traffic website being linked to by a popular news feed  “Slashdot Effect” 
DELAYED CHAINING: A PRACTICAL P2P SOLUTION FOR VIDEO-ON-DEMAND Speaker : 童耀民 MA1G Authors: Paris, J.-F.Paris, J.-F. ; Amer, A. Computer.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
Cost Shifting Nicholas Weaver Bulk Data P2P: Cost Shifting, not Cost Savings Nicholas Weaver All opinions are my own Not of my employer or funding institutions.
Bit Torrent A good or a bad?. Common methods of transferring files in the internet: Client-Server Model Peer-to-Peer Network.
Distributing Layered Encoded Video through Caches Authors: Jussi Kangasharju Felix HartantoMartin Reisslein Keith W. Ross Proceedings of IEEE Infocom 2001,
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
The Case for Persistent-Connection HTTP Telecommunication System LAB 최 명길 Western Research Laboratory Research Report 95/4 (Proceedings of the SIGCOMM.
Software Performance Testing Based on Workload Characterization Elaine Weyuker Alberto Avritzer Joe Kondek Danielle Liu AT&T Labs.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Characteristic Studies of Distributed Systems Maryam Rahmaniheris & Daniel Uhlig 1.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Towards a Distributed WMS Cache... The Problem: NASA's World Wind offers a case in point that delivering multiple terabytes of free map data to the public.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
WSV Problem Background 3. Accelerated Protocols and Workloads 4. Deployment and Management 2. BranchCache Solution Modes 5. BranchCache Protocols.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Alliance Alliance Performance Status - CREQ Régis ELLING July 2011.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
CSE Computer Networks Prof. Aaron Striegel Department of Computer Science & Engineering University of Notre Dame Lecture 19 – March 23, 2010.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
1 CS 425 / ECE 428 Distributed Systems Fall 2013 Indranil Gupta (Indy) Measurement Studies Lecture 22 Nov Reading: See links on website All Slides.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
#16 Application Measurement Presentation by Bobin John.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VIII Web Performance Modeling (Book, Chapter 10)
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
The Impact of Replacement Granularity on Video Caching
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
Peer-to-Peer Video Services
Improving Performance in the Gnutella Protocol
Significance Tests: The Basics
Content Distribution Networks + P2P File Sharing
Content Distribution Networks + P2P File Sharing
Presentation transcript:

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess

Goal - Overview –Determine if the KaZaA search space is queried in such a way that a group of 25,000 clients can satisfy most of their own requests. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Goals - Details –Capture an extensive trace –Utilize that trace to understand file- sharing traffic flows –Model user and object activity –Determine inefficiencies in the distribution model –Propose solutions to inefficiencies Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Motivations –Beginning in file-sharing traffic began to exceed HTTP traffic in terms of aggregate bandwidth consumed –File-sharing traffic is much less understood than HTTP traffic even though it represents such a large segment of bandwidth usage –Bandwidth is expensive Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload HTTP Traffic P2P Traffic

The Trace –2 Machines –203 days 5 hours and 6 minutes –22.7TB of KaZaA file transfer traffic –Captured seasonal variations End of spring Summer Fall semester Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Trace Conclusions –Users are patient 30 minutes to retrieve a small object Up to 1 week to retrieve a large object –Users consume less as they age Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Trace Conclusions –Users machines are not very active A session is an unbroken length of time where a client has one or more file transfers in progress. Average sessions are only 2 minutes –90 th percentile 28 minutes Over the life of a client, it is only active 5.54% of the time or 0.20% of the trace period –90 th percentile clients are active most of their life, and 4.15% of the trace –Without control traffic analysis, is this meaningful? Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Transfer A Transfer B Transfer DTransfer E 3 Minutes2 Minutes Session Lengths

Trace Conclusions – Objects –Most requests are for small objects – 91% –Most bytes transferred are part of large objects – 65% –There are many small objects –There are few large objects –Small Objects’ popularity is subject to heavy churn No small object was in the top 10 for all 6 months Only 1 large object lived in the top 10 for 6 months 44 large files remained in the top 100 for 6 months The most popular small objects are new objects –Most requests are for old objects Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Fetch-at-most-once –Once a KaZaA user obtains an object, they will not need to retrieve another copy 94% of Objects are fetched once per user 99% are fetched less than twice per user –Stems from the fact that media files are immutable and never ‘stale’ You may refresh ‘slashdot.org’ three times a day, but there is no point download ‘thriller.mpeg’ seventeen times. –This keeps KaZaA workload from following a Zipf curve even though object popularity does. Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Workload Modeling –Create a set of objects and give them popularity based on a zipf distribution –Create a set of clients that requests objects in proportion to there popularity –Have each client ‘fetch-at-most-once’ –Measure the distribution of transfers Does it follow a zipf curve How many big-object requests can a population of size N satisfy Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Popular objects are not requested as curve would predict

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Would a proxy cache help? –At first the proxy will cache the popular objects and succeed. –But as ‘fetch-at-most-once’ draws clients away from the Zipf curve and the proxy begins to fail. –What happens if we increase density of popularity? Curve starts higher and falls faster

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Previous model did not insert new objects. –New popular objects tend to ‘correct’ the work load. –Through providing locality New clients however do not help, they contribute to keeping old object’s popular and destroy locality

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Validating The Model –Capture parameters that are inputs to the model from the trace Number of clients Number of objects User request rate Probability user requests given file - Guess Probability of popularity of new objects - Guess Object arrival rates – Guess –Run simulation with harvested parameters –See if simulation predicts what actually happened

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Simulation seems to successfully predict reality. But with three free variables used to tune results, is this fair?

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload What inefficiencies can we eliminate? –Analysis against the trace shows 86% of object transfers were from external sources when an internal source possessed the object. A traditional proxy, given the resources, could cut bandwidth utilization by 86% –Would have to host pirated data Could use a proxy redirector instead. Must know the availability of the objects –Control traffic is obfuscated Build locality into the protocol –Does this sacrifice anonymity?

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload How successful would a locality aware protocol be? –Assume that a client is available for periods the trace shows it as active During a file transfer - extremely conservative

Questions? Will increasing efficiency decrease load as the authors would like? Or simply increase work achieved per dollar? Do clients have insatiable appetites? Are you worried that a large number of queries might have already been locally satisfied? Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload