CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

Storage System Integration with High Performance Networks Jon Bakken and Don Petravick FNAL.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Chapter 4.  Understand network connectivity.  Peer-to-Peer network & Client-Server network  Understand network topology  Star, Bus & Ring topology.
Slide 1 What is a Computer Network? A computer network is a linked set of computer systems capable of sharing computer power and resources such as printers,
Common Devices Used In Computer Networks
ALICE data access WLCG data WG revival 4 October 2013.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Local Area Networks: Internetworking
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.
Communication Networks Fourth Meeting. Types of Networks  What is a circuit network?  Two people are connected and allocated them their own physical.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
File and Object Replication in Data Grids Chin-Yi Tsai.
UNM RESEARCH NETWORKS Steve Perry CCNP, CCDP, CCNP-V, CCNP-S, CCNP-SP, CCAI, CMNA, CNSS 4013 Director of Networks.
USATLAS Network/Storage and Load Testing Jay Packard Dantong Yu Brookhaven National Lab.
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
Scientific Networking: The Cause of and Solution to All Problems April 14 th Workshop on High Performance Applications of Cloud and Grid Tools Jason.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
Scavenger performance Cern External Network Division - Caltech Datagrid WP January, 2002.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
ANSE: Advanced Network Services for [LHC] Experiments Artur Barczyk, California Institute of Technology for the ANSE team LHCONE Point-to-Point Service.
Network to and at CERN Getting ready for LHC networking Jean-Michel Jouanigot and Paolo Moroni CERN/IT/CS.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.
LHC P2P Experiments. Elephant vs Turtle Flows, do they matter ? Various Projects History  DYNES (FDT + OSCARS + Dragon / OESS)  OLiMPS (Multipath modules.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
Connect communicate collaborate LHCONE European design & implementation Roberto Sabatino, DANTE LHCONE Meeting, Washington, June
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
SCIENCE_DMZ NETWORKS STEVE PERRY, DIRECTOR OF NETWORKS UNM PIYASAT NILKAEW, DIRECTOR OF NETWORKS NMSU.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
The DYNES Architecture & LHC Data Movement Shawn McKee/University of Michigan For the DYNES collaboration Contributions from Artur Barczyk, Eric Boyd,
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
End-Site Orchestration 1 With Open vSwitch (OVS) R. Voicu, A. Mughal, J. Balcas, H. Newman for the Caltech Team.
UNM SCIENCE DMZ Sean Taylor Senior Network Engineer.
LJCONE Conference Taipei 2016 Azher Mughal Caltech
Azher Mughal Caltech (12/8/2016)
Performance measurement of transferring files on the federated SRB
Joint Genome Institute
Azher Mughal Caltech UCSD PRP Conference (2/21/2017)
Joint AGLT2-MWT2 Networking meeting
R. Hughes-Jones Manchester
AGLT2 Site Report Shawn McKee/University of Michigan
GridPP Tier1 Review Fabric
Azher Mughal Caltech UCSD PRP Conference (2/21/2017)
Big-Data around the world
Presentation transcript:

CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech

 Servers are spread across 3 Data Centers within campus  Total Nodes including storage servers : 300  CPU Cores : 4500  Storage Space :  Raw : 3 PetaBytes  Useable : 1.5 PetaBytes  Hadoop Storage Replication Factor : 50%  GridFTPs are gateway to the rest of the CMS grid traffic Caltech Tier2 Center Data Center remodeled in late 2013

 100G uplink to CENIC (NSF CC-NIE award) with 10GE as a backup.  40G Inter building connectivity.  Vendor neutral, switching hardware from Brocade, Dell and cisco.  Active Ports:  8 x 40GE  ~40 x 10GE ports  ~500 x 1GE ports  Core switches support OpenFlow 1.0 (OF 1.3 by 4 th Qtr 2014). Data Center Connectivity  OSCARS / OESS Controller in production since Peering with Internet2.  DYNES connected storage node.  NSI ready using OSCARS NSI Bridge.

 CC-NIE award helped purchase of 100G equipment  Tier2 connected with Internet2 LHCONE VRF since April 2014  In case of link failure, Primary 100G path fails over to the 10G link  Direct IP peering with UFL over AL2S and FLR  Ongoing p-t-p performance analysis experiments with Nebraska through Internet2 Advanced Layer2 Services (AL2S) 100G LHCONE Connectivity, IP Peerings

perfSONAR LHCONE Tests  Participating in US CMS Tier2 and LHCONE perfSONAR mesh tests.  Separate perfSONAR instances for OWAMP (1G) and BWCTL (10G)  RTT plays major factor in bandwidth throughput (single stream). ~]# ping perfsonar-de-kit.gridka.de 64 bytes from perfsonar-de-kit.gridka.de ( ): icmp_seq=1 ttl=53 time=172 ms ~]# ping ps-bandwidth.lhcmon.triumf.ca 64 bytes from ps-bandwidth.lhcmon.triumf.ca ( ): icmp_seq=2 ttl=58 time=29.8 ms

PhEDEx – Book keeping for CMS Data Sets. Knows the End points and manages high level aspects of the transfers (e.g. file router). FTS – Negotiates the transfers among end sites/points and initiates transfers through the GridFTP servers. SRM – Selects the appropriate GridFTP Server (mostly round-robin). Capacity vs Application Optimization CMS Software Components Primer GridFTP – Actual workhorse or grid middleware for the transfers between end sites. Or, an interface between the storage element and the wide area network.

Capacity vs Application Optimization  US CMS mandated a 20Gbps disk-to-disk test rates using PhEDEx load tests. (  The software stack is not ready as an out of the box to achieve higher throughputs among sites. Requires lot of tunings.  Benchmarking the individual GridFTP servers to understand the limits.  GridFTP uses 3 different file checksum algorithms for each file transfer. Consumes lot of CPU cycles.  During the first round of tests, FTS (the older version) used 50 transfers in parallel. This limit was removed in the August release.

10 Gbps gridFTPs behave very well – up to 88% capacity steady – Peaking at 96% capacity – Optimal CPU/Memory consumption Capacity vs Application Optimization 20G Total = 10G Local in + 10G WAN out Dual 10Gbps

 Increases transfers observed during the LHCONE ANA integration.  Manual ramp up was just a test because remote PhEDEx sites were not subscribing enough transfers to FTS links (e.g. Caltech - CNAF) Manual Ramp up LHCONE -> ANA 100G Capacity vs Application Optimization

Tier2 traffic flows, Peaks of 43G over AL2S with 20G to just CNAF. Capacity vs Application Optimization

11 Logical Layout from CERN (USLHCNet) to Caltech (Pasadena) Testing High Speed Transfers

12 Storage Server:  RHEL 7.0, Mellanox OFED ( )  SuperMicro (X9DRX+-F)  Dual Intel E v2 (IVY Bridge)  64GB DDR3 RAM  Intel NVMe SSD drives (Gen3 PCIe)  Two Mellanox 40GE VPI NICs Client Systems:  SL 6.5, Mellanox OFED  SuperMicro (X9DR3-F)  Dual Intel E (Sandy Bridge)  64GB DDR3 RAM  One Mellanox 40GE VPI NIC Storage Server:  RHEL 7.0, Mellanox OFED ( )  SuperMicro (X9DRX+-F)  Dual Intel E v2 (IVY Bridge)  64GB DDR3 RAM  Intel NVMe SSD drives (Gen3 PCIe)  Two Mellanox 40GE VPI NICs Client Systems:  SL 6.5, Mellanox OFED  SuperMicro (X9DR3-F)  Dual Intel E (Sandy Bridge)  64GB DDR3 RAM  One Mellanox 40GE VPI NIC Server and Operating System Specifications Testing High Speed Transfers

13 Disk write at 8.2GB/ sec Data writing over the SSD Drives, Destination server in Caltech Testing High Speed Transfers Facts:  Despite of all the tunnings, Due to higher RTT, single TCP stream performed low at 360Mbps. Approx. 200 streams were used.  AL2S is shared infrastructure. During these transfers, LA node showed 97.03Gbps.  CPU is the bottleneck when u sing TCP at this rate and number of flows (network and I/O processes compete with each other) Gbps

Conclusions / Moving Forward  It is very important for the LHC experiments to be aware of the impact of their large data flows on the R&E networks, both in the US and Europe and across the Atlantic. With modern day off the shelf equipment, 100GE paths can be easily overloaded.  Caltech is able to achieve the US CMS Tier2 milestones with peaks reaching to 43Gbps (CC-NIE 100G, LHCONE, ANA).  We are looking forward on how to integrate OLiMPS multi-path controller with Internet2 flow space firewall (p-t-p services):  to create either parallel paths to same destination (avoid backbone congestion) or  individual paths to destinations by looking at load on the WAN.  Benchmarking next generation CPUs and memory, keeping the software stack tuned and knowing its limitations under different set of application requirements.  SDN Multipath demonstrations over the WAN and on the show floor will be showcased during the Supercomputing Conference 2014 in Louisiana.

Time for Questions!