FTS Issue in Beijing Erming PEI 2010/06/18.

Slides:



Advertisements
Similar presentations
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Advertisements

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
RDMA vs TCP experiment.
Does the IEEE MAC Protocol Work Well in Multihop Wireless Ad Hoc Networks? Shugong Xu Tark Saadawi June, 2001 IEEE Communications Magazine (Adapted.
Enhancing TCP Fairness in Ad Hoc Wireless Networks Using Neighborhood RED Kaixin Xu, Mario Gerla University of California, Los Angeles {xkx,
Grid simulation (AliEn) Network data transfer model Eugen Mudnić Technical university Split -FESB.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
©2001 Pål HalvorsenINFOCOM 2001, Anchorage, April 2001 Integrated Error Management in MoD Services Pål Halvorsen, Thomas Plagemann, and Vera Goebel University.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
FZU participation in the Tier0 test CERN August 3, 2006.
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
1 CSTS WG CSTS WG Prototyping for Forward CSTS Performance Boulder November 2011 Martin Karch.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
An In-Depth Examination of Java I/O Performance and Possible Tuning Strategies Kai Xu Hongfei Guo
RAC parameter tuning for remote access Carlos Fernando Gamboa, Brookhaven National Lab, US Frederick Luehring, Indiana University, US Distributed Database.
1 Measurements of Internet performance for NIIT, Pakistan Jan – Feb 2004 PingER From Les Cottrell, SLAC For presentation by Prof. Arshad Ali, NIIT.
End-to-End performance tuning Brian Davies Gridpp28 Manchester 2012.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
Analysis of TCP Latency over Wireless Links Supporting FEC/ARQ-SR for Error Recovery Raja Abdelmoumen, Mohammad Malli, Chadi Barakat PLANETE group, INRIA.
KAIS T Computer Architecture Lab. Div. of CS, Dept. of EECS KAIST CS492 Lab Summary.
Roberto Barbera Prague, ALICE Multi-site Data Transfer Tests on a Wide Area Network Giuseppe Lo Re Roberto Barbera Work in collaboration with:
Solving End-to-End Problems Internet2 Staff Meeting Ann Arbor 29 November, 2001.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
INFSO-RI Enabling Grids for E-sciencE FTS failure handling Gavin McCance Service Challenge technical meeting 21 June.
Roberto Barbera Prague, Dipartimento di Fisica dell’Università di Catania and INFN Catania - Italy ALICE Collaboration Update on the network.
Grid Computing 4 th FCPPL Workshop Gang Chen & Eric Lançon.
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
Gu Minhao, DAQ group Experimental Center of IHEP February 2011
Featrues of Compputer Networks
Lab A: Planning an Installation
Measurement team Hans Ludwing Reyes Chávez Network Operation Center
The Beijing Tier 2: status and plans
Wireless-N Comparative Results -3rd party testing preliminary results
On the feasibility of 1Gbps for various MAC/PHY architectures
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
James Casey, IT-GD, CERN CERN, 5th September 2005
Kaixin Xu, Mario Gerla University of California, Los Angeles {xkx,
LCG France Network Infrastructures
ETR-NASA DTN Phase-1 Test Results
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Data Challenge with the Grid in ATLAS
The transfer performance of iRODS between CC-IN2P3 and KEK
ATLAS activities in the IT cloud in April 2008
BNL FTS services Hironori Ito.
Enrico Fattibene CDG – CNAF 18/09/2017
Performance Issues in 2010 PERT workshop 2010 Chris Welti
Abhishek Singh Rana UC San Diego
Farida Fassi, Damien Mercie
CCRC08 May Post-Mortem Tier-1 view
STORM & GPFS on Tier-2 Milan
CC IN2P3 - T1 for CMS: CSA07: production and transfer
STEP’09 Tier-1 Centres Report
Support for high performance UDP/TCP applications
High Speed File Replication
Wide Area Networking at SLAC, Feb ‘03
Automatic TCP Buffer Tuning
Brian L. Tierney, Dan Gunter
Outline Problem DiskRouter Overview Details Real life DiskRouters
lundi 25 février 2019 FTS configuration
Web100 at SLAC Doug Chang, Warren Matthews, Les Cottrell (SLAC).
Network performance issues recently raised at IN2P3-CC
Network Speed time = size of file (in bits) / network speed (in bits).
T2D Idea Metrics T2 directly connected to T1s
When to use and when not to use BBR:
Presentation transcript:

FTS Issue in Beijing Erming PEI 2010/06/18

Failures IN2P3BEIJING experienced many transfer failures since the middle of May. Almost all large files failed, only small files passed. Only ~500KB/s per file in average. FTS Channel configuration: 10files and 10streams shared by ATLAS and CMS ~1Gb/s bandwidth in total (iperf result: ~90MB) ATLAS failures is proportional to CMS transfers CMS had aggressive transfers from all over the world with multiple channels and 2 routes, while ATLAS transfers only from CC-IN2P3 to Beijing in single channel.

Network Routes Europe America CSTNET IHEP 2.5Gbps 622Mbps 1Gbps CMS ATLAS IHEP

Reason/Solution There’s a network bottleneck in CSTNET!! No matter how we tuned all the parameters, the peak of the total bandwidth can only reach 550Mbps. Even when we tried to get CMS activities stopped! After negotiation CNIC, the total bandwidth was recovered to ~1Gbps. Both ATLAS/CMS file transfers had better performance later. All transfers passed.

Total Throughput

Intermediation ATLAS and CMS still compete the bandwidth CMS transfers files from all the T1s and many T2s to Beijing, which ate most of the bandwidth. In order to intermediate the two experiments, we had such Tuned FTS channel to 40 concurrent files and 16 streams Tuned TCP maximum window to 32MB Set the concurrent number of files and VO share specifically for CMS to limit its activities. CMS channel limit on IN2P3-BEIJING and STAR-BEIJING: 5 Reduce CMS VO share on IN2P3-BEIJING to 20%

Now, the files transfer rate is very good. Big files all got successfully transferred.

BEIJINGIN2P3 Since June 4th, BEJINGIN2P3 had very low performance. It turned out that the fault of bad TCP window default size: only ~80KB

Individual Throughput <50KB/s

TCP Window parameters: After tuned the default TCP window size to 4MB, the situation got improved. TCP Window parameters: net.ipv4.tcp_rmem = 1048576 4194304 16777216 net.ipv4.tcp_wmem = 1048576 4194304 16777216

Conclusion Found a network bottleneck and finally got it eliminated Tuning of TCP window size has good effect on improving the performance. Intermediation between ATLAS and CMS is necessary. Thanks to Eric for keeping eyes on this issue…