Download presentation
Presentation is loading. Please wait.
Published byGyles West Modified over 9 years ago
1
1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d96725004 林慶和 d95725005 2009 年 6 月 15 日 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 12, NO. 2, APRIL 2004
2
2 ACN 2009 Authors Subhabrata Sen received the B.Eng. Degree in computer science from Jadavpur University, India, in 1992, and the M.S. and Ph.D. degrees in computer science from the University of Massachusetts,A mherst, in 1997 and 2001, respectively. Jia Wang received the B.S. degree in computer science from the State University of New York, Binghamton, in 1996, and the M.S. and Ph.D. degrees in computer science from Cornell University, Ithaca, NY, in 1999 and 2001, respectively. They’re currently two members of the Internet and Networking Systems Research Center at AT&T Labs–Research in Florham Park, NJ. Their research interests include network measurement, routing and topology analysis, traffic flow measurement, overlay networks and applications, network security and anomaly detection, Web performance, content distribution networks, and other Internet-related research work. Dr. Sen and Dr.Wang are the members of the Association for Computing Machinery (ACM).
3
3 ACN 2009 Introduction Motivation & Goals The use of P2P applications is for distributed file sharing Large and growing traffic volume impact on the underlying network to characterize P2P behavior with a view to understanding how these systems impact the network and to gain insights into developing P2P systems with superior performance. Previous research almost exclusively on P2P signaling traffic setting up P2P crawlers on the Internet, using “active probing” approach Early version Based on data from the edge networks provide a view of local P2P usage This work provides a complementary “backbone view” from a large tier-1 ISP gathering data at multiple border routers across the ISP.
4
4 ACN 2009 Outline Methodology Characterization Metrics View and Analysis results P2P vs Web
5
5 ACN 2009 Methodology Popular P2P Applications Three systems: Gnutella, FastTrack, DirectConnect All decentralized, self organizing Data and index information distributed over peers Transient peer membership Measurement Approach Large-scale passive measurement Flow-level data gathered from routers across a large tier-1 ISP’s backbone Analyze both signaling and data traffic Three levels of granularity: IP address, network prefix, Autonomous system Collect data using Cisco’s NetFlow
6
6 ACN 2009 Methodology Advantages Requires knowledge about P2P protocol: port# Non-intrusive measurement More easy than crawler More complete view of P2P traffic Allow localized analysis Limitations Flow level data, No AP-level details May not capture the complete flow
7
7 ACN 2009 Characterization Metrics Characterization Topology: hosts distributions, application-level overlay Traffic distribution: downstream & upstream Dynamic behavior:how frequently hosts join an leave the system, how long a host stay…
8
8 ACN 2009 Characterization Metrics Metrics Host distribution Traffic Volume Host Connectivity Traffic pattern over time Connection duration and on-time Data cleaning Invalid IP: 10.x.x.x/8 、 172.16.x.x/13 、 192.168.x.x/16 No matched prefix in routing tables Invalid AS#(>64512) 、 Remove 4% of flow records
9
9 ACN 2009 Overview of P2P traffic uTABLE I Netflow DATA SET OF P2P TRAFFIC OVER TCP uTotal around 800 million flow records
10
10 ACN 2009 Host distribution Fig. 2. Host density: the distribution of the hosts participating in three P2P systems per day (y-axis is in logscale).
11
11 ACN 2009 Traffic volume distribution Fig. 3. Cumulative distribution of traffic volume associated with IP addresses ranked in decreasing order of volume, for September 14, 2001 (x-axis is in logscale). Aggregate traffic observed for FastTrack on this day was 960 GB. uSignificant skews in traffic volume across granularities u Few entities source/receive most of the traffic
12
12 ACN 2009 Host connectivity uFig. 5. Cumulative distribution of network connectivity at the IP and network prefix (PR) levels, for hosts participating in FastTrack on September 14, 2001. uConnectivity is very small for most hosts, very high for few hosts u Distribution is less skewed at prefix and AS levels
13
13 ACN 2009 Time of day effect uFig. 6. Distribution of number of IP addresses and traffic volume across hours in FastTrack on September 14, 2001 (GMT). (a) The traffic volume transferred in each bin. (b) The number of unique IP addresses, network prefixes, and ASes that are active in each bin.
14
14 ACN 2009 Host connection duration & on-time uSubstantial transience: most hosts stay in the system for a short time u Distribution less skewed at the prefix and AS levels uFastTrack (9/14/2001) thd=30min
15
15 ACN 2009 Mean bandwidth usage uFig. 9. Cumulative distribution of the mean upstream and downstream bandwidth usage of hosts participating in FastTrack, and DirectConnect on September 14, 2001 (x- axis is in logscale). (a) FastTrack. (b) DirectConnect. uUpstream < Downstream: ADSL, Rate limiting
16
16 ACN 2009 Traffic Characterization The P2P traffic does not fit well with power law distributions. Relationships between measures Traffic volume #IPs On-times Mean bandwidth usage
17
17 ACN 2009 The power laws uFig. 10. Rank-frequency plots of the P2P metrics for FastTrack on September 14, 2001: (a) overall host connectivity; (b) host connectivity for the top 10% IP addresses; (c) traffic volume of the top 10% IP addresses; (d) on-time of the top 10% IP addresses (both x-axis and y-axis are labeled in logscale).
18
18 ACN 2009 Relationships: Traffic volume vs on-time 、 Connectivity 、 #BW Volume heavy hitters are likely to have long on-times; Hosts with short on-times contribute small traffic volumes A Host communicating with many others can transmit a small amount of traffic; a host communicating with few others can also source significant traffic. Volume heavy hitters are likely to have large bandwidths; Hosts with small bandwidths contribute small traffic volumes
19
19 ACN 2009 Traffic volume vs on-time 、 Connectivity 、 #BW uFig. 11. FastTrack data set for September 14, 2001—top 1%. IP addresses ranked by volume of data sent out. Scatter plots (log-log scale): (a) upstream volume versus upstream on-time; (b) upstream volume versus number of unique upstream IP addresses that an IP address connects to; (c) upstream volume versus average upstream bandwidth of an IP address.
20
20 ACN 2009 Connectivity 、 on-time 、 #BW Hosts with high connectivity have long on- times; Hosts with short on-times communicate with few other hosts. Hosts with high upstram badwidths have low connectivity counts; Hosts send traffic to many others tend to span the bandwidths, but no one with the highest bandwidths Hosts with low upstram badwidths have very long on-time (maybe download large file or SuperNode)
21
21 ACN 2009 Connectivity 、 on-time 、 #BW uFig. 12. FastTrack data set for September 14, 2001—top 1% IP addresses ranked by volume of data sent out. Scatter plots (log-log scale): (a) number of unique upstream IP addresses that a host connects to versus total upstream on-time of the IP address; (b) number of unique upstream IP addresses versus average upstream bandwidth; (c) average upstream bandwidth versus total upstream on-time.
22
22 ACN 2009 P2P vs Web 97% of prefixes contributing P2P traffic also contribute Web traffic Heavy hitter prefixes for P2P traffic tend to be heavy hitters for Web traffic P2P traffic contributed by the top heavy hitter prefixes is more stable than either Web or total traffic 0.01%, 0.1%, 1%, 10% heavy hitters contribute 10%, 30%, 50%, 90% of the traffic volume
23
23 ACN 2009 P2P vs Web uFig. 13. Cumulative distribution of the traffic volume changes for top heavy hitter prefixes. (a) Top 0.01%. prefixes. (b) Top 1% prefixes.
24
24 ACN 2009 Summary The analysis covers both signaling & data traffic. complements previous work for Gnutella. Significant increase in both traffic volume and number of Users. The traffic volume generated by individual hosts is extremely variable less than 10% #IPs 99% of the traffic volume. Traffic distributions are extremely skewed Both of traffic volume, connectivity, ontime and average bandwidth usage. But do not strictly obey with power laws.
25
25 ACN 2009 Summary All three P2P systems exhibit a high level of system dynamics But only a small fraction of hosts are persistent over long time periods. P2P is significant, but stable component of the Internet traffic More stable than Web traffic or overall traffic Application-specific layer-3 traffic engineering is a promising way to manage the P2P workload in an ISP’s network.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.