100G R&D for Big Data at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab ISGC – March 22, 2013 Overview Fermilab.

Slides:

Advertisements

Similar presentations

Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.

Advertisements

Big Data over a 100G Network at Fermilab Gabriele Garzoglio Grid and Cloud Services Department Computing Sector, Fermilab CHEP 2013 – Oct 15, 2013.

Experience and proposal for 100 GE R&D at Fermilab Interactomes – May 22, 2012 Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector,

High Throughput Data Program at Fermilab R&D Parag Mhashilkar Grid and Cloud Computing Department Computing Sector, Fermilab Network Planning for ESnet/Internet2/OSG.

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

US CMS Tier1 Facility Network Andrey Bobyshev (FNAL) Phil DeMar (FNAL) CHEP 2010 Academia Sinica Taipei, Taiwan.

CHEPREO Tier-3 Center Achievements. FIU Tier-3 Center Tier-3 Centers in the CMS computing model –Primarily employed in support of local CMS physics community.

FNAL Site Perspective on LHCOPN & LHCONE Future Directions Phil DeMar (FNAL) February 10, 2014.

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.

GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

Questionaire answers D. Petravick P. Demar FNAL. 7/14/05 DLP -- GDB2 FNAL/T1 issues In interpreting the T0/T1 document how do the T1s foresee to connect.

CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.

CD FY09 Tactical Plan Review FY09 Tactical Plan for Wide-Area Networking Phil DeMar 9/25/2008.

Fermi National Accelerator Laboratory 3 Fermi National Accelerator Laboratory Mission Advances the understanding of the fundamental nature of matter.

Improving Network I/O Virtualization for Cloud Computing.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.

100G R&D at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab Overview Fermilab Network R&D 100G Infrastructure.

100G R&D at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab Overview Fermilab Network R&D 100G Infrastructure.

GlobusWorld 2012: Experience with EXPERIENCE WITH GLOBUS ONLINE AT FERMILAB Gabriele Garzoglio Computing Sector Fermi National Accelerator.

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.

DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.

George Kola Computer Sciences Department University of Wisconsin-Madison DiskRouter: A Mechanism for High.

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.

Spectrum of Support for Data Movement and Analysis in Big Data Science Network Management and Control E-Center & ESCPS Network Management and Control E-Center.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

Windows Azure Virtual Machines Anton Boyko. A Continuous Offering From Private to Public Cloud.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.

Storage and Data Movement at FNAL D. Petravick CHEP 2003.

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

High Throughput Data Program (HTDP) at FNAL Mission: investigate the impact of and provide solutions for the scientific computing challenges in Big Data.

An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.

Brookhaven Science Associates U.S. Department of Energy 1 Network Services LHC OPN Networking at BNL Summer 2006 Internet 2 Joint Techs John Bigrow July.

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.

Tackling I/O Issues 1 David Race 16 March 2010.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,

1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.

CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL PSI, 18 March 2013.

Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"

Run - II Networks Run-II Computing Review 9/13/04 Phil DeMar Networks Section Head.

100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.

Supporting Advanced Scientific Computing Research Basic Energy Sciences Biological and Environmental Research Fusion Energy Sciences High Energy Physics.

100G R&D at Fermilab Gabriele Garzoglio (for the High Throughput Data Program team) Grid and Cloud Computing Department Computing Sector, Fermilab Overview.

Big Data over a 100G Network at Fermilab Gabriele Garzoglio Grid and Cloud Services Department Computing Sector, Fermilab CHEP 2013 – Oct 15, 2013 Overview.

Fermilab Cal Tech Lambda Station High-Performance Network Research PI Meeting BNL Phil DeMar September 29, 2005.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

Extreme Scale Infrastructure

Dynamic Extension of the INFN Tier-1 on external resources

WP18, High-speed data recording Krzysztof Wrona, European XFEL

Provisioning 160,000 cores with HEPCloud at SC17

DOE Facilities - Drivers for Science: Experimental and Simulation Data

Big-Data around the world

Presentation transcript:

100G R&D for Big Data at Fermilab Gabriele Garzoglio Grid and Cloud Computing Department Computing Sector, Fermilab ISGC – March 22, 2013 Overview Fermilab Network R&D 100G Infrastructure at Fermilab Results from the ESnet 100G testbed

Fermilab Users and 100G Using the network for decades in the process of scientific discovery for sustained, high speed, large and wide-scale distribution of and access to data  High Energy Physics community  Multi-disciplinary communities using grids (OSG, XSEDE) Figures of merit  94 Petabytes written to tape, today mostly coming from offsite  160Gbps peak LAN traffic from archive to local processing farms  LHC peak WAN usage in/out of Fermilab at Gbps 2 Compact Muon Solenoid (CMS) routinely peaks at Gbps for WAN traffic in/out of Fermilab. 94 PB of data ever written to the Enstore tape archive 20 Gpbs CMS WAN Traffic (1 of 2 Dec, 2012 Written data: 94 PB 2 yrs G 2012

Network R&D at Fermilab 3 A diverse program of work that spans all layers of computing for scientific discovery A collaborative process benefitting from the effort of multiple research organizations A broad range of activities internally and externally funded Infrastructure / Network Layers 1-3 OS / Network Driver Data Management Data Management Application I/O Storage Management Storage Management Tools & Services Network Management Science Code - E-Center - OSG / LHC Dashboard - End-Site Control Plane System - 100G High-Throughput Data - CortexNet Data Selection - Multicore-Aware Data Transfers Fermilab Network R&D Facility - G-NetMon

Pulling all R&D effort together from the top layers… Providing tools & services to enable users / applications to optimize use of the network  Collaborating with the OSG Network Area for the deployment of perfSONAR at 100 OSG facilities  Aggregating and displaying data through E-Center and the OSG Dashboard for end-to-end hop-by-hop paths across network domains Developing tools to monitor real-time 100G network traffic through multi-core architectures Proposed integration with Data Management through network-aware data source selection – CortexNET  Seeking collaborators for network forecast module 4

Pulling all R&D effort together from the bottom layers… Application-level R&D through the High Throughput Data Program  R&D on 100G for production use by CMS & FNAL high- capacity high-throughput Storage facility  Identifying gaps in data movement middleware for the applications used for scientific discovery – GridFTP, SRM, Globus Online, XRootD, Frontier / Squid, NFS v4 OS-level R&D on multicore-aware data transfer middleware  Optimizing network I/O for 40/100G environments Integrating local network infrastructure with WAN circuit technologies through policy-driven configuration (ESCPS) 5

A dedicated R&D Network facility Nexus 7000 w/ 2-port 100GE module / 6-port 40GE module / 10GE copper module 12 nodes w/ 10GE Intel X540-AT2 (PCIe) / 8 cores / 16 GB RAM 2 nodes w/ 40GE Mellanox ConnectX®-2 (PCIe-3) / 8 cores w/ Nvidia M2070 GPU 6 Catalyst 6509E for 1GE systems  IPv6 tests / F5 load balancer / Infoblox DNS, Palo Alto firewall - 100G R&D - Production-like env for tech eval - Testing of firmware upgrades * Diagram courtesy of Phil Demar

Current Fermilab WAN Capabilities Metropolitan Area Network provides 10GE channels:  Currently 8 deployed Five channels used for circuit traffic  Supports CMS WAN traffic Two used for normal routed IP traffic  Backup 10GE for redundancy  Circuits fail over to routed IP paths * Diagram courtesy of Phil Demar 7

Near-Future Fermilab WAN Capabilities ESnet ChiExpress MAN:  One 100G channel  Circuit-based high impact science data traffic  Network R&D activities  Three 10G channels  For default routed IP traffic  Full geographic diversity within MAN  Production deployment in spring of

Use of 100G Wave for FNAL R&D Test Bed 100G wave will support 50G of CMS traffic Remaining ~50G for FNAL R&D network  Potentially higher when CMS traffic levels are low Planning WAN circuit into ESnet 100G testbed  Potential for circuits to other R&D collaborations 9

Goals of 100G Program at Fermilab 10 Experiment analysis systems include a deep stack of software layers and services. Need to ensure these are functional and effective at the 100G scale end-to-end.  Determine and tune the configuration of all layers to ensure full throughput in and across each layer/service.  Measure and determine efficiency of the end-to-end solutions.  Monitor, identify and mitigate error conditions. Fermilab Network R&D Facility

100G High Throughput Data Program 2011: Advanced Network Initiative (ANI) Long Island MAN (LIMAN) testbed.  GO / GridFTP over 3x10GE : Super Computing ’11  Fast access to ~30TB of CMS data in 1h from NERSC to ANL using GridFTP.  15 srv / 28 clnt – 4 gFTP / core; 2 strms; TCP Win. 2MB : ESnet 100G testbed  Tuning parameters of middleware for data movement: xrootd, GridFTP, SRM, Globus Online, Squid. Achieved ~97Gbps  Rapid turn around on the testbed thank to custom boot images  Commissioning Fermilab Network R&D facility: 8.5 Gbps per 10G node Spring 2013: 100G Endpoint at Fermilab  Validate hardware link w/ transfer apps for CMS current datasets  Test NFS v4 over 100G using dCache (collab. w/ IBM research) Gbps (peak: 75)

GridFTP / SRM / GlobusOnline Tests 12 Data Movement using GridFTP  3 rd party Srv to Srv trans.: Src at NERSC / Dest at ANL  Dataset split into 3 size sets Large files transfer performance ~ 92Gbps Small files transfer performance - abysmally low Issues uncovered on Esnet 100G Testbed:  GridFTP Pipelining needs to be fixed on Globus implementation Optimal performance: 97 Gbps w/ GridFTP 2 GB files – 3 nodes x 16 streams / node 97 Gbps GO control latencies GO control channel sent to the VPN through port forwarding

XRootD Tests 13 Data Movement over XRootD, testing LHC experiment (CMS / Atlas) analysis use cases.  Clients at NERSC / Servers at ANL  Using RAMDisk as storage area on the server side Challenges  Tests limited by the size of RAMDisk  Little control over xrootd client / server tuning parameters Calculation of the scaling factor between 1 NIC and an aggregated 12 NIC for datasets too large to fit on the RAM disk

Squid / Frontier Tests 14 Data transfers  Cache 8 MB file on Squid – This size mimics LHC use case for large calib. data  Clients (wget) at NERSC / Servers at ANL  Data always on RAM Setup  Using Squid2: single threaded  Multiple squid processes per node (4 NIC per node)  Testing core affinity on/off: pin Squid to core i.e. to L2 cache  Testing all clnt nodes vs. all servers AND aggregate one node vs. only one server Results  Core-affinity improves performance by 21% in some tests  Increasing the number of squid processes improves performance  Best performance w/ 9000 clients: ~100 Gbps

Summary The Network R&D at Fermilab spans all layers of the communication stack Science discovery by HEP and Astrophysics drive the program of work Fermilab is deploying a Network R&D facility with 100G capability ESnet 100G Testbed has been fundamental for our middleware validation program Fermilab will have 100GE capability in the Spring 2013  Planning to participate in the ESnet 100G Testbed 15