Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

Slides:



Advertisements
Similar presentations
Storage System Integration with High Performance Networks Jon Bakken and Don Petravick FNAL.
Advertisements

Big Data over a 100G Network at Fermilab Gabriele Garzoglio Grid and Cloud Services Department Computing Sector, Fermilab CHEP 2013 – Oct 15, 2013.
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Experience and proposal for 100 GE R&D at Fermilab Interactomes – May 22, 2012 Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector,
High Throughput Data Program at Fermilab R&D Parag Mhashilkar Grid and Cloud Computing Department Computing Sector, Fermilab Network Planning for ESnet/Internet2/OSG.
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
GridFTP Guy Warner, NeSC Training.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
GlobusWorld 2012: Experience with EXPERIENCE WITH GLOBUS ONLINE AT FERMILAB Gabriele Garzoglio Computing Sector Fermi National Accelerator.
1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),
Protocols for Wide-Area Data-intensive Applications: Design and Performance Issues Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi, Brian.
Fermi National Accelerator Laboratory 3 Fermi National Accelerator Laboratory Mission Advances the understanding of the fundamental nature of matter.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Improving Network I/O Virtualization for Cloud Computing.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
100G R&D at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab Overview Fermilab Network R&D 100G Infrastructure.
100G R&D at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab Overview Fermilab Network R&D 100G Infrastructure.
GlobusWorld 2012: Experience with EXPERIENCE WITH GLOBUS ONLINE AT FERMILAB Gabriele Garzoglio Computing Sector Fermi National Accelerator.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
File and Object Replication in Data Grids Chin-Yi Tsai.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
Data Transfers in the Grid: Workload Analysis of Globus GridFTP Nicolas Kourtellis, Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi University of South.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Managed Object Placement Service John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Argonne National Lab.
Parallel TCP Bill Allcock Argonne National Laboratory.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Spectrum of Support for Data Movement and Analysis in Big Data Science Network Management and Control E-Center & ESCPS Network Management and Control E-Center.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
GridFTP Richard Hopkins
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Computing Sector, Fermi National Accelerator Laboratory 4/12/12GlobusWorld 2012: Experience with
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
THE GLUE DOMAIN DEPLOYMENT The middleware layer supporting the domain-based INFN Grid network monitoring activity is powered by GlueDomains [2]. The GlueDomains.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL PSI, 18 March 2013.
100G R&D for Big Data at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab ISGC – March 22, 2013 Overview Fermilab.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
Big Data over a 100G Network at Fermilab Gabriele Garzoglio Grid and Cloud Services Department Computing Sector, Fermilab CHEP 2013 – Oct 15, 2013 Overview.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
Dynamic Extension of the INFN Tier-1 on external resources
Experiences with http/WebDAV protocols for data access in high throughput computing
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Enabling High Speed Data Transfer in High Energy Physics
Large Scale Test of a storage solution based on an Industry Standard
Grid Canada Testbed using HEP applications
Outline Problem DiskRouter Overview Details Real life DiskRouters
FTS Issue in Beijing Erming PEI 2010/06/18.
Presentation transcript:

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes Basic network capacity using nuttcp GridFTP and Globus Online XrootD RTT between FNAL & NERSC (control) : 55 ms RTT between FNAL & ANL (control) : 108 ms RTT between NERSC & ANL (data) : 54 ms SC2011 Demo : shared 100 GE The Grid & Cloud Computing Dept. of Fermilab demonstrated the use of 100 GE network to move CMS data with GridFTP Test Characteristics 15 NERSC & 26 ANL nodes w/ 10 GE NIC 10 CMS files of 2 GB (RAM to RAM only) Total 30 TB transferred in one hour Result: data transfer rate at ~70 Gbps sustained with peaks at 75 Gbps LIMAN Testbed : 40 GE Main tools tested on the Long Island Metropolitan Area Network: Globus Online(GO) and GridFTP(GF) Compare 3 transfer mechanisms (to see overheads from GO and control channels) Local GF transfer (server to server) FNAL-controlled GridFTP transfer GO-controlled GridFTP transfer Compare 3 sets of files with different sizes (to see the effects of transfer protocol overhead on small files) Result: Overheads observed in the use of Globus-Online and small files RTT between FNAL & BNL (ctrl) : 36 ms RTT among testbed nodes (data): 2 ms Identifying Gaps in Grid Middleware on Fast Networks with The Advanced Networking Initiative Motivation Goal of the High Throughput Data Program(HTDP) at the Fermilab Computing Sector is to support Fermilab and its stakeholders in the adoption of a 100GE networking infrastructure. Focus compile a list of key services used by relevant research communities/facilities identify gaps in current infrastructure and tools that interface with 100GE networks We are conducting a series of tests with key tools on a test bed 100 GE network which is operated by US DoE ESnet’s Advanced Networking Initiative(ANI) Basic Network Throughput Test with nuttcp Motivation: Confirm basic performance of network with parameters to tune and Compare with baseline provided by ANI team Results NIC to NIC : 9.89 Gbps (as expected from 10 GE NIC) 4 NICs to 4 NICs between 2 nodes : 39 Gbps (as expected from 4 NICs) Aggregate throughput using 10 TCP streams (10 pairs of NIC-NIC) : 99 Gbps GridFTP and Globus Online Test Motivation 1 : Using a single instance of GridFTP client/server is not efficient What is the efficient way to increase the throughput via each NIC? What is the efficient way to transfer a single file? Answer: use multiple parallel streams for each file transfer, globus-url-copy –p N What is the efficient way to transfer a set of files? Answer: use multiple concurrent globus-gridftp-servers, globus-url-copy –cc M We launch multiple clients and servers with multiple streams opened between them Motivation 2 : we expect protocol overheads to be different across various file sizes Files of various sizes are transferred from client disk to server memory Dataset split into 3 sets: Small(8KB - 4MB), Medium(8MB -1G), Large(2, 4, 8 GB) Motivation 3 : In addition to locally-controlled GridFTP, we tested 2 remotely-controlled configurations 1. Use port-forwarding to access GridFTP clients/servers (labeled: “Remote”) 2. Use Globus-Online We also compare server-server transfer with client-server transfer Results GridFTP does not suffer from protocol overhead for large & medium size files Observe significant overhead in the case of small size files Remote use of GridFTP via Globus Online suffers from protocol overhead We think RTT affects results for small files XrootD Test Motivation: What is the efficient way to increase the throughput via each NIC? We are focusing only on tuning transfer parameters of xrootd Test begins with single instance of xrdcp and xrootd Server side: one xrootd writing to RAMdisk or HDD Are multiple concurrent transfers possible in xrootd? The equivalent of the “GridFTP –cc” option is not available but we can emulate it by launching multiple xrdcp. xrootd server accepts multiple connections by using multithreading. How efficient is it? Are multiple parallel transfers possible in xrootd? Not practical for our test Results : Limited by RAMdisk, we estimate the aggregate throughput by scaling one-NIC result for files of over 2 GB 2GB, 4GB and 8GB file transfer results are estimated to be 77 Gbps, 87 Gbps and 80 Gbps respectively.(Assume 10 NICs. 8GB uses maximum 4 clients) Fermilab is Operated by the Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the United States Department of Energy D. Dykstra, G. Garzoglio, H. Kim, P. Mhashilkar Scientific Computing Division, Fermi National Accelerator Laboratory GUC/c ore GUC streams GUC TCP Window Size Files/G UC MAX BW Sustain BW T D112Default T2122MB16552 D2122MB16552 T3422MB17370 D3422MB17570 May – October 2011 November 2011 January Present Conclusion Basic network capacity test is close to 100 GE Can saturate the bandwidth capacity by increasing streams GridFTP: suffers from protocol overhead for small files Globus Online: working with GO to improve performance XrootD: test at initial stage but gives throughput comparable to GridFTP. Not many performance-tuning options are available Tests on the Current ANI 100 GE Testbed CHEP2012 Poster ID 214 Local: Client-Server Local: Server-Server Remote: Server-Server Globus Online Large87.92 Gbps92.74 Gbps91.19 Gbps62.90 Gbps Medium76.90 Gbps90.94 Gbps81.79 Gbps28.49 Gbps Small2.99 Gbps2.57 Gbps2.11 Gbps2.36 Gbps 1 client2 clients4 clients8 clients 8 GB, 1-NIC3 Gbps5 Gbps7.9 GbpsN/A Large, 1-NIC (2 / 4 GB) 2.3 / 2.7 Gbps 3.5 / 4.4 Gbps 5.6 / 6.9 Gbps 7.7 / 8.7 Gbps Medium (64M / 256M) 2.9 / 8.8 Gbps 5.7 / 14.7 Gbps 11.2 / 23.9 Gbps 22 / 39 Gbps Small (256K / 4M) 0.03 / 0.19 Gbps 0.07 / 0.38 Gbps 0.11 / 0.76 Gbps 0.1 / 1.4 Gbps