An Analysis of Bulk Data Movement Patterns in Large-scale Scientific Collaborations W. Wu, P. DeMar, A. Bobyshev Fermilab CHEP 2010, TAIPEI TAIWAN

Slides:



Advertisements
Similar presentations
Internet Measurement Conference 2003 Source-Level IP Packet Bursts: Causes and Effects Hao Jiang Constantinos Dovrolis (hjiang,
Advertisements

Virtual Network Diagnosis as a Service Wenfei Wu (UW-Madison) Guohui Wang (Facebook) Aditya Akella (UW-Madison) Anees Shaikh (IBM System Networking)
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 8: Monitoring the Network Connecting Networks.
US CMS Tier1 Facility Network Andrey Bobyshev (FNAL) Phil DeMar (FNAL) CHEP 2010 Academia Sinica Taipei, Taiwan.
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
2014 Examples of Traffic. Video Video Traffic (High Definition) –30 frames per second –Frame format: 1920x1080 pixels –24 bits per pixel  Required rate:
Routing Basics By Craig Lindstrom. Overview Routing Process Routing Process Default Routing Default Routing Static Routing Static Routing Dynamic Routing.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.
Multiple constraints QoS Routing Given: - a (real time) connection request with specified QoS requirements (e.g., Bdw, Delay, Jitter, packet loss, path.
Todd Deshane Ashwin Venkatraman McNair Program Clarkson University
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Available bandwidth measurement as simple as running wget D. Antoniades, M. Athanatos, A. Papadogiannakis, P. Markatos Institute of Computer Science (ICS),
Copyright © 2005 Department of Computer Science CPSC 641 Winter Network Traffic Measurement A focus of networking research for 20+ years Collect.
Transmitting and Tracking Packets of Data Through The TCP and UDP Network Protocols Todd Deshane Ashwin Venkatraman McNair Program Clarkson University.
Presented by Anshul Kantawala 1 Anshul Kantawala FAST TCP: From Theory to Experiments C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R.
Transmitting and Tracking Packets of Data Through The TCP and UDP Network Protocols Todd Deshane Ashwin Venkatraman McNair Program Clarkson University.
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Bandwidth Estimation: Metrics Mesurement Techniques and Tools By Ravi Prasad, Constantinos Dovrolis, Margaret Murray and Kc Claffy IEEE Network, Nov/Dec.
Practical TDMA for Datacenter Ethernet
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
Network Monitoring School of Electronics and Information Kyung Hee University. Choong Seon HONG Selected from ICAT 2003 Material of James W. K. Hong.
1 Version 3.1 Module 4 Learning About Other Devices.
Reading Report 14 Yin Chen 14 Apr 2004 Reference: Internet Service Performance: Data Analysis and Visualization, Cross-Industry Working Team, July, 2000.
How the Internet Works Acknowledgment and Disclaimer: This presentation is supported in part by the National Science Foundation under Grant Any.
NetfFow Overview SANOG 17 Colombo, Sri Lanka. Agenda Netflow –What it is and how it works –Uses and Applications Vendor Configurations/ Implementation.
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
Descriptive Data Analysis of File Transfer Data Sudarshan Srinivasan Victor Hazlewood Gregory D. Peterson.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
TCP/IP Yang Wang Professor: M.ANVARI.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 2. Network Monitoring Metrics.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 5. Passive Monitoring Techniques.
1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA Graduate School of Information Science and Electrical.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 4. Active Monitoring Techniques.
NetFlow: Digging Flows Out of the Traffic Evandro de Souza ESnet ESnet Site Coordinating Committee Meeting Columbus/OH – July/2004.
Measuring the Internet: A case study by Bob Mandeville and Andrew Corlett
UDT: UDP based Data Transfer Yunhong Gu & Robert Grossman Laboratory for Advanced Computing University of Illinois at Chicago.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February
Chapter 8: Internet Operation. Network Classes Class A: Few networks, each with many hosts All addresses begin with binary 0 Class B: Medium networks,
ASCR/ESnet Network Requirements an Internet2 Perspective 2009 ASCR/ESnet Network Requirements Workshop April 15/16, 2009 Richard Carlson -- Internet2.
April 4th, 2002George Wai Wong1 Deriving IP Traffic Demands for an ISP Backbone Network Prepared for EECE565 – Data Communications.
Wide Area Network Performance Analysis Methodology Wenji Wu, Phil DeMar, Mark Bowden Fermilab ESCC/Internet2 Joint Techs Workshop 2007
Service Level Monitoring. Measuring Network Delay, Jitter, and Packet-loss  Multi-media applications are sensitive to transmission characteristics of.
Net Flow Network Protocol Presented By : Arslan Qamar.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
Measurement in the Internet Measurement in the Internet Paul Barford University of Wisconsin - Madison Spring, 2001.
1 Distributed Monitoring CERNET's experience Xing Li
Internet Connectivity and Performance for the HEP Community. Presented at HEPNT-HEPiX, October 6, 1999 by Warren Matthews Funded by DOE/MICS Internet End-to-end.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
A Cooperative Multi-Channel MAC Protocol for Wireless Networks IEEE Globecom 2010 Devu Manikantan Shila, Tricha Anjali and Yu Cheng Dept. of Electrical.
By Miguel A. Erazo Advisor: Jason Liu March 2009.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
An Analysis of AIMD Algorithm with Decreasing Increases Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data Mining.
Connect communicate collaborate Performance Metrics & Basic Tools Robert Stoy, DFN EGI TF, Madrid September 2013.
Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,
1 WAN Network Researches Wenji Wu, Phil Demar Fermilab Nov 30, 2007.
Wide Area Network Performance Analysis Methodology
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
Networking between China and Europe
Network and Services Management
Using Netflow data for forecasting
Chapter 8: Monitoring the Network
FAST TCP : From Theory to Experiments
By Manish Jain and Constantinos Dovrolis 2003
Chapter-5 Traffic Engineering.
Chapter 8 – Data switching and routing
Presentation transcript:

An Analysis of Bulk Data Movement Patterns in Large-scale Scientific Collaborations W. Wu, P. DeMar, A. Bobyshev Fermilab CHEP 2010, TAIPEI TAIWAN

Topics 1.Background 2.Problems 3.Fermilab Flow Data Collection & Analysis System 4.Bulk Data Movement Pattern Analysis Data analysis methodology Bulk Data Movement Patterns 5.Questions

1. Backgrounds Large-scale research efforts such as LHC experiments, ITER, and Climate modeling are built upon large, globally distributed collaborations. – Depend on predictable and efficient bulk data movement between collaboration sites. Industry and scientific institutions have created data centers comprising hundreds, and even thousands, of computation nodes to develop massively scaled, highly distributed cluster computing platforms. Existing transport protocols such as TCP do not work well in LFPs (long fat pipes). Parallel data transmission tools such as GridFTP have been widely applied to bulk data movements.

2. Problems What is the current status of bulk data movement in support of Large-scale Scientific Collaborations? What are the bulk data movement patterns in Large-scale Scientific Collaborations?

3. Fermilab Flow Data Collection & Analysis System

Fow-based analysis produces data that are more fine- grained than those provided by SNMP but still not as detailed and high-volume as required for packet trace analysis. Fermilab Flow Data Collection and Analysis System We have already deployed Cisco NetFlow and CMU’s SiLK toolset for network traffic analysis. Cisco NetFlow is running at Fermilab’s border router with no sampling configured. It exports flow records to our SiLK traffic analysis system. The SiLK analysis suite is a collection of command-line tools for processing Flow records. The tools are intended to be combined in various ways to perform an analysis task.

4. Bulk Data Movement Pattern Analysis

Data Analysis Methodology We analyzed the traffic flow records from 11/11/2009 to 12/23/2009. – The total flow record database has a size of 60GBytes, with 2,679,598,772 flow records. – There are totally 23,764,073GByte data and 2.221x10 12 packets transferred between Fermilab and other sites. We only analyzed TCP bulk data transfers We analyzed data transfers between Fermilab and /24 subnets. – The TOP 100 /24 sites that transfer to FNAL. – The TOP 100 /24 sites that transfer from FNAL.

Bulk Data Movement Patterns & Status

TOP20 Sites that transfer from FNAL TOP20 Sites that transfer to FNAL We analyzed data transfers between Fermilab and /24 subnets. – In the IN direction, the TOP 100 Sites transfer 99.04% traffic. – In the OUT direction, the TOP 100 sites transfer 95.69% traffic TOP 100 Sites

TOP 100 Sites (Cont) The TOP 100 sites are located around the world. We collected and calculated the Round Trip Time (RTT) between FNAL and these TOP 100 sites both in IN and OUT directions. Indore, India N American Sites Europe, S America Parts of Asia Indore, India N American Sites Europe, S America Parts of Asia Troitsk, Russia A few sites have circuitous paths to/from FNAL. Many factors: lack of peering, traffic engineering, or due to physical limitation. Beijing, China

Circuitous Paths Chicago IL, USA Kansas, USA Denver, USA Sunnyvale, USA Seattle, USA Tokyo, JP Singapore, SG Traceroute from FNAL to Indore, India To Tokyo

Single Flow Throughputs We calculated statistics for single flow throughputs between FNAL and TOP100 Sites in both IN and OUT directions. – Each flow record includes data such as the number of packets and bytes in the flow and the timestamps of the first and last packet. A few concerns – Should exclude the flow records for pure ACKs of the reverse path – A bulk data movement usually involves frequent administrative message exchanges between sites. A significant number of flow records are generated due to these activities. These flow records usually contain a small number of packets with short durations; the calculated throughputs are usually inaccurate and vary greatly. These flow records should also be excluded from throughput calculation. ACKs 1500B for Ethernet payload Histogram for Packet SizeHistogram for Flow Size

Single Flow Throughputs (cont) The slowest throughput is Mbps, Indore, India Most average throughputs are less than 10Mbps! But 1Gbps NICs are widely deployed! The slowest throughput is Mbps, Houston, TX!!! Either the path from FNAL to Houston is highly congested, or some network equipments are malfunctioning, or the systems in Houston site are badly configured.

Single Flow Throughputs (cont) In the IN direction – 2 sites’ average throughputs are less than 1Mbps “National Institute of Nuclear Physics, Italy”, “Indore, India” – 63 sites’ average throughputs are less than 10Mbps – Only 1 site’ average throughput is greater than 100Mbps In the OUT direction – 7 sites’ average throughput are less than 1Mbps “Indore, India”,”Kurchatov Institute, Russia”, “University of Ioannina, Greece”, “IHP, Russia”, “ITEP, Russia”, “LHC Computing Grid LAN, Russia”, “Rice University, TX” – 60 sites’ average throughput are less than 10Mbps – No site’s average throughput is greater than 100Mbps TCP Throughput <= ~0.7 * MSS / (rtt * sqrt(packet_loss)) We calculate the correlation of “average throughput vs. RTT” In the IN direction, it is In the OUT direction, it is

Aggregated Throughputs Parallel data transmission tools such as GridFTP have been widely applied to bulk data movements In both IN and OUT directions, for each of the TOP100 sites, we bin traffic at 10 minute intervals – We calculate the aggregated throughputs – We collect the flow statistics – We collect the statistics of # of different IPs involved (hosts)

Histogram of Average # of Flows Histogram of Max # of Flows Histogram of Aggregated Throughputs From TOP100 Sites to FNAL Thousands of parallel flows In general, the aggregated throughputs are higher We see the effect of parallel data transmission

From TOP100 Sites to FNAL His. of Correlation (Aggregated Thru vs. # of Flows) His. of Correlation (Aggregated Thru vs. # of Source IPs) His. Of Corr. (Aggr. Thru vs. # of Dest. Ips) is similiar We calculate the correlation between aggregated throughputs vs. number of flow In general, more parallel data transmission (# of flows) generate higher aggregated throughputs But for some sites, more parallel data transmission generate less aggregated throughputs – More parallel data transmission causes network congestion – parallel data transmission make disk I/O less efficient. College of William & Mary, USA Negative correlative Positive correlative

From TOP100 Sites to FNAL Totally, there are 35 sites that the “correlation between aggregated throughputs vs. number of flow” is negative. – The worst case is from “College of William & Mary, USA” to FNAL, the correlation is only There are 31 sites that the “correlation between aggregated throughputs vs. number of flow” is greater than 0.5, which implies that increasing the number of flows can effectively enhance the throughputs. – The best case is from “University of Glasgow, UK” to FNAL.

From TOP100 Sites to FNAL Histogram of Average # of Source IPs Histogram of Average # of Destination IPs Histogram of Max # of Destination IPs Histogram of Max # of Source IPs Some sites use only a single host to transfer!!! Some sites utilize hundreds of hosts!!!

From FNAL to TOP100 Sites Histogram of Aggregated Throughputs Histogram of Average # of Flows Histogram of Max # of Flows In general, the aggregated throughputs are higher We see the effect of parallel data transmission The transmission from FNAL to TOP100 Sites is better than the other way around.

From FNAL to TOP100 Sites We calculate the correlation between aggregated throughputs vs. number of flow In general, more parallel data transmission (# of flows) generate higher aggregated throughputs But for some sites, more parallel data transmission generate less aggregated throughputs – More parallel data transmission causes network congestion – parallel data transmission make disk I/O less efficient. His. of Correlation (Aggregated Thru vs. # of Flows) His. of Correlation (Aggregated Thru vs. # of Dest. IPs) His. Of Corr. (Aggr. Thru vs. # of Src. Ips) is similiar

From FNAL to TOP100 Sites Histogram of Average # of Source IPs Histogram of Max # of Source IPs Histogram of Average # of Destination IPs Histogram of Max # of Destination IPs Some sites use only a single host to transfer!!! Some sites utilize hundreds of hosts!!!

Conclusion We study the bulk data movement status for FNAL-related Large-scale Scientific Collaborations. We study the bulk data movement patterns for FNAL-related Large-scale Scientific Collaborations.