Site Throughput Review and Issues Shawn McKee/University of Michigan US ATLAS Tier2/Tier3 Workshop May 27 th, 2008.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

ISCSI guides and suggestions. For most implementations.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
Lesson 12 – NETWORK SERVERS Distinguish between servers and workstations. Choose servers for Windows NT and Netware. Maintain and troubleshoot servers.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
A measurement study of vehicular internet access using in situ Wi-Fi networks Vladimir Bychkovsky, Bret Hull, Allen Miu, Hari Balakrishnan, and Samuel.
Hands-on Networking Fundamentals
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
1 ESnet Network Measurements ESCC Feb Joe Metzger
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
MCTS Guide to Microsoft Windows 7
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
Network Monitoring for OSG Shawn McKee/University of Michigan OSG Staff Planning Retreat July 10 th, 2012 July 10 th, 2012.
Technology for Using High Performance Networks or How to Make Your Network Go Faster…. Robin Tasker UK Light Town Meeting 9 September.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
USATLAS Network/Storage and Load Testing Jay Packard Dantong Yu Brookhaven National Lab.
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 13: Monitoring and Optimizing Active Directory.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
BNL’s Network diagnostic tool IPERF was used and combined with different strategies to analyze network bandwidth performance such as: -Test with iperf.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.
BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
Online-Offsite Connectivity Experiments Catalin Meirosu *, Richard Hughes-Jones ** * CERN and Politehnica University of Bucuresti ** University of Manchester.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
Coupling Facility. The S/390 Coupling Facility (CF), the key component of the Parallel Sysplex cluster, enables multisystem coordination and datasharing.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update on Network Performance Monitoring.
OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
WLCG Latency Mesh Comments + – It can be done, works consistently and already provides useful data – Latency mesh stable, once configured sonars are stable.
B ENCHMARK ON D ELL 2950+MD1000 ATLAS Tier2/Tier3 workshop Wenjing wu AGLT2 / University of Michigan 2008/05/27.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
Advanced Network Diagnostic Tools Richard Carlson EVN-NREN workshop.
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
Federating Data in the ALICE Experiment
James Casey, IT-GD, CERN CERN, 5th September 2005
MCTS Guide to Microsoft Windows 7
R. Hughes-Jones Manchester
Migration Strategies – Business Desktop Deployment (BDD) Overview
Wide Area Networking at SLAC, Feb ‘03
Presentation transcript:

Site Throughput Review and Issues Shawn McKee/University of Michigan US ATLAS Tier2/Tier3 Workshop May 27 th, 2008

US ATLAS Throughput Working Group  In Fall 2007 a “Throughput” working group was put together to study our current Tier-1 and Tier-2 configurations and tunings and see where we could improve.  First steps were identifying current levels of performance.  Typically sites performed significantly below expectations.  During weekly calls we would schedule Tier-1 to Tier-2 testing for the following week. Gather a set of experts on a chat line or phone line and debug the setup.  See See See US ATLAS Tier2/Tier3 Workshop

Roadmap for Data Transfer Tuning  Our first goal is to achieve ~200 MB (bytes) per second from the Tier-1 to each Tier-2. This is achieved by either/both:  Small number of high-performance systems (1-5) at each site transferring data via GridFTP, FDT or similar applications.  Single systems benchmarked with I/O > 200 MB/sec  10 GE NICs (“back-to-back” tests achieved 9981 Mbps)  Large number of “average” systems (10-50), each transferring to a corresponding systems at the other site. This is a good match to an SRM/dCache transfer of many files between sites  Typical current disks are MB/sec  Gigabit network is 125 MB/sec  Doesn’t require great individual performance / host: 5-25 MB/sec US ATLAS Tier2/Tier3 Workshop

Page_IN New York -> Geneva Read and writes on 4 SATA disks in parallel on each server Mean traffic ~ 210 MB/s ~ 0.75 TB per hour MB/s CERN ->CALTECH Read and writes on 2 RAID Controllers in parallel on each server Mean traffic ~ 545 MB/s ~ 2 TB per hour 1U Server with 4 HDs 4U Disk Server with 24 HDs Working on integrating FDT with dCache FDT Examples: High Performance Possible US ATLAS Tier2/Tier3 Workshop

Data Transfers for US ATLAS (Oct07) Typical rate is 0.5 – 3 MB/s!!! Why aren’t we achieving ~30 MB/s ? US ATLAS Tier2/Tier3 Workshop

Beginning “Load Tests” Untuned 800 Mbps -> US ATLAS Tier2/Tier3 Workshop

What are the Primary Issues?  Network  First documented what existing network capabilities were for each site  Typically POOR by default.  After tuning could achieve “wire-speed” in memory to memory tests  Storage  Highly variable. Individual disks can vary from 5 – 90 MB/sec depending upon type of disk interface, RPMs, cache, etc.  RAID controllers can utilize disks in parallel, in principle leading to linear increases in I/O performance as a function of # of spindles(disks)…  Drivers/OS/Kernel can impact performance, as can various tunable parameters  End-to-end Paths  What bottlenecks exist between sites? Is competing traffic a problem?  Overall operations  Many “small” files limit achievable throughput  How well does the system function when data has to move from disk-driver-memory- driver-network-wan-network-driver-memory-driver-disk?! US ATLAS Tier2/Tier3 Workshop

Tuning/Debugging Methodology  First document site topology: Create network diagrams representative of the path to “doors” and “storage” (See ) (See )  Gather information on the servers involved in data transport at the site.  TCP stack settings  NIC model and firmware  OS, processors, memory  Storage details (including benchmarking)  Schedule a working call involving the Tier-1 and Tier-2 under test. Perform a series of tests and tunings. US ATLAS Tier2/Tier3 Workshop

Network Tools and Tunings  The network stack is the 1 st candidate for optimization  Amount of memory allocated for data “in-flight” determines maximum achievable bandwidth for a given src-destination  Parameters (some example settings):  net.core.rmem_max =  net.core.wmem_max =  net.ipv4.tcp_rmem =  net.ipv4.tcp_wmem =  Other useful tools: Iperf, NDT, wireshark, tracepath, ethtool, ifconfig, sysctl, netperf, FDT.  Lots more info/results in this area available online…  US ATLAS Tier2/Tier3 Workshop

Achieving Good Networking Results  Test system-pairs with Iperf (tcp) to determine achievable bandwidth  Check ‘ifconfig ’ to see if errors or packet loss exists  Examine driver info with ‘ethtool –i ’  Set TCP stack parameters to allow full use of bottleneck bandwidth, typically 1 gigabit. The maximum expected round-trip-time (RTT) should be used to estimate the amount of memory for data “in flight” and this should be setup in the TCP stack parameters. NOTE: set the maximums large enough…leave the default and “pressure” values low.  Retest with Iperf to determine effect of change.  Debug with Iperf (udp), ethtool, NDT, wireshark if there are problems  Remember to check both directions… US ATLAS Tier2/Tier3 Workshop

Tunable Parameters Impacting I/O  There are potentially MANY places in a linux OS that can have an impact on I/O for WAN transfers…  We need to explore the impact of various tunings and options.  The purple areas in the figure have been at least initially explored by Kyu Park (UFL).  Wenjing Wu (UM) is continuing work in this area. US ATLAS Tier2/Tier3 Workshop

Initial Tuned Tests (Dec 31 – Jan 6) We have setup manual (contact Hiro) “on-demand” load tests for this… US ATLAS Tier2/Tier3 Workshop

Initial Findings(1)  Most sites had at least a gigabit capable path in principle.  Some sites had < 1Gbit/sec bottlenecks that weren’t discovered until we starting trying to document and test sites  Many sites had 10GE “in principle” but getting end-to-end connections clear at 10GE required changes  Most sites were un-tuned or not properly tuned to achieve high throughput  Sometimes flaky hardware was the issue:  Bad NICs, bad cables, underpowered CPU, insufficient memory, etc  BNL was limited by the GridFTP doors to around 700 MB/s US ATLAS Tier2/Tier3 Workshop

Initial Findings(2)  Network for properly tuned hosts is not the bottleneck  Memory-to-disk tests interesting in that they can expose problematic I/O systems (or give confidence in them)  Disk-to-disk tests do poorly. Still a lot of work required in this area. Possible issues:  Wrongly tuned parameters for this task (driver, kernel, OS)  Competing I/O interfering  Conflicts/inefficiency in the Linux “data path” (bus-driver-memory)  Badly organized hardware, e.g., network and storage cards sharing the same bus  Underpowered hardware or bad applications for driving gigabit links US ATLAS Tier2/Tier3 Workshop

T1/T2 Host Throughput Monitoring  How it works  Control plugin for Monalisa runs iperf and gridftp tests twice a day from select Tier 1 (BNL) to Tier 2 hosts and from each Tier 2 host to Tier 1, (production SE hosts).  Results are logged to file.  Monitoring plugin for Monalisa reads log and graphs results.  What it currently provides  Network throughput to determine if network tcp parameters need to be tuned.  Gridftp throughput to determine degradation due to gridftp software and disk.  Easy configuration to add or remove tests.  What else can be added within this framework  Gridftp memory to memory and memory to disk (and vice-versa) tests to isolate disk and software degradation separately.  SRM tests to isolate SRM degradation  FTS to isolate FTS degradation US ATLAS Tier2/Tier3 Workshop

Iperf (Mem-to-Mem) tests T1 T2 US ATLAS Tier2/Tier3 Workshop

GridFTP (Disk-to-Disk) T1 T2 US ATLAS Tier2/Tier3 Workshop

Iperf History Graphing US ATLAS Tier2/Tier3 Workshop

GridFTP History Plots US ATLAS Tier2/Tier3 Workshop

perfSONAR in USATLAS  There is a significant, coordinated effort underway to instrument the network in a standardized way. This effort, call perfSONAR is jointly supported by DANTE, Esnet, GEANT2, Internet2 and numerous University groups.  Within USATLAS we will be targeting implementation of a perfSONAR instance for our facilities. Its primary purpose is to aid in network diagnosis by quickly allowing users to isolate the location of problems. In addition it provides a standard measurement of various network performance related metrics  USATLAS Plan: Each Tier-2, and the Tier-1, will provide a dedicated system to run perfSONAR services. (more soon)  See for information on perfSONAR US ATLAS Tier2/Tier3 Workshop

Near-term Steps for Working Group  Lots of work is needed for our Tier-2’s in end-to-end throughput.  Each site will want to explore options for system/architecture optimization specific to their hardware.  Most reasonably powerful storage systems should be able to exceed 200MB/s. Getting this consistently across the WAN is the challenge!  For each Tier-2 we need to tune GSIFTP (or FDT) transfers between “powerful” storage systems to achieve “bottleneck” limited performance  Implies we document the bottleneck: I/O subsystem, network, processor?  Include 10GE connected pairs where possible…target 400MB/s/host-pair  For those sites supporting SRM, trying many host pair transfers to “go wide” in achieving high-bandwidth transfers. US ATLAS Tier2/Tier3 Workshop

Throughput Goals We have a set of goals for our “Throughput” work: For the last two goals we are awaiting GridFTP upgrades at BNL to support this level of throughput. Right now this looks like mid-June This is supposed to be a “living table” on our Twiki site…As we modify and upgrade our sites we retest to verify what the changes have done for our performance. US ATLAS Tier2/Tier3 Workshop

Plans and Future Goals  We still have a significant amount of work to do to reach consistently high throughput between our sites.  Performance analysis is central to identifying bottlenecks and achieving high-performance.  Automated testing via BNL’s MonALISA plugin will be maintained to provide both baseline and near-term monitoring.  perfSONAR will be deployed in the next month or so.  Continue “on-demand” load-tests to verify burst capacity and change impact  Finish all sites in our goal table & document our methodology US ATLAS Tier2/Tier3 Workshop

Questions? US ATLAS Tier2/Tier3 Workshop

Some Monitoring Links test/Control/conf/network_m2m test/Control/conf/network_m2mhttps://svn.usatlas.bnl.gov/svn/usatlas-int/load- test/Control/conf/network_m2mhttps://svn.usatlas.bnl.gov/svn/usatlas-int/load- test/Control/conf/network_m2m test/Control/conf/gridftp_d2d test/Control/conf/gridftp_d2dhttps://svn.usatlas.bnl.gov/svn/usatlas-int/load- test/Control/conf/gridftp_d2dhttps://svn.usatlas.bnl.gov/svn/usatlas-int/load- test/Control/conf/gridftp_d2d US ATLAS Tier2/Tier3 Workshop