Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.

Slides:



Advertisements
Similar presentations
DRI Grant impact at the smaller sites Pete Gronbech September 2012 GridPP29 Oxford.
Advertisements

National Grid's Contribution to LHCb IFIN-HH Serban Constantinescu, Ciubancan Mihai, Teodor Ivanoaica.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Southwest Tier 2 Center Status Report U.S. ATLAS Tier 2 Workshop - Harvard Mark Sosebee for the SWT2 Center August 17, 2006.
Networking in Virtual Environments Virtualization – Why do I care? Technical components of virtualization Networking in a virtual world What is cloud computing?
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
Passive traffic measurement Capturing actual Internet packets in order to measure: –Packet sizes –Traffic volumes –Application utilisation –Resource utilisation.
CISCO NETWORKING ACADEMY Chabot College ELEC Router Introduction.
Hardening Linux for Enterprise Applications Peter Knaggs & Xiaoping Li Oracle Corporation Sunil Mahale Network Appliance Session id:
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
CVMFS AT TIER2S Sarah Williams Indiana University.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
ALICE Tier-2 at Hiroshima Toru Sugitate of Hiroshima University for ALICE-Japan GRID Team LHCONE workshop at the APAN 38 th.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
1. 2 First Things First: Internet and Web Basics Chapter 1.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Site Lightning Report: MWT2 Mark Neubauer University of Illinois at Urbana-Champaign US ATLAS Facilities UC Santa Cruz Nov 14, 2012.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
Locality Aware dCache & Discussion on Sharing Storage USATLAS Facilities Meeting SMU October 12, 2011.
Switches 1RD-CSY  In this lecture, we will learn about  Collision Domain and Microsegmentation  Switches – a layer two device ◦ MAC address.
Cisco Routers Objectives –How to log into a Cisco router and determine basic settings. Contents –Differences in available methods of access. –Different.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Block1 Wrapping Your Nugget Around Distributed Processing.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
CCNA 2 Week 1 Routers and WANs. Copyright © 2005 University of Bolton Welcome Back! CCNA 2 deals with routed networks You will learn how to configure.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
RCF Status One issues with the Mass Storage System (HPSS) –On 1/1 at 2:40 PM the core server process stopped working –Was automatically restarted by heartbeat.
Switches 1RD-CSY  In this lecture, we will learn about  Collision Domain and Microsegmentation  Switches – a layer two device ◦ MAC address.
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
1 Worker Node Requirements TCO – biggest bang for the buck –Efficiency per $ important (ie cost per unit of work) –Processor speed (faster is not necessarily.
By Matthew. Advantages: Can share and keep information private within a network Workstations do not need a hard drive to access the information Users.
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
T3 data access via BitTorrent Charles G Waldman USATLAS/University of Chicago USATLAS T2/T3 Workshop Aug
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Managing a growing campus pool Eric Sedore
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Cisco Routers.
StoRM+Lustre Performance Test with 10Gbps Network YAN Tian for Distributed Computing Group Meeting Nov. 4th, 2014.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
Cisco Study Guide
Example of a LAN Last Update Copyright 2009 Kenneth M. Chipps Ph.D.
Bob Ball/University of Michigan
Joint AGLT2-MWT2 Networking meeting
ATLAS Sites Jamboree, CERN January, 2017
NET2.
Cisco Routers Presented By Dr. Waleed Alseat Mutah University.
Presentation transcript:

Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012

Factors to check Storage servers Internal UC network Internal IU network WAN network Effect of dCache-locality caching versus WAN direct access IU analysis nodes specifically 2

Individual storage servers We have previously measured performance of each storage node individually with various “blessing tests” – Nodes are uct2-s[14] and iut2-s[6] – Note xxt2-s[3] are first gen; xxt2-s[4-14] are SAS2 H800 Each storage node is over-provisioned for CPU and memory (96G) – even while running dCache services and Xrootd- overlay Each node has a single 10G NIC, a potential bottleneck – Some of the s-nodes have an additional 10G port that could be cabled and bonded Currently only UC and IU have storage and since all accesses are local no analysis jobs run at UIUC presently – UIUC will add 300TB this fall 3

Typical Storage node 4

Storage Network utilization at UC 5 An hour sample. See IO nicely spread over servers, no obvious bottlenecks or hot spots

Storage Network utilization UC (week) 6 Over past week. More less the same- good spread, MB/s continuously per system

UC Network 7 This link is now 2x10G

UC Network – Bottlenecks (1) PC8024F and PC6248 stack – Cacti (guest/cacti): w&local_graph_id=3581&rra_id=all w&local_graph_id=3581&rra_id=all – Last week. There are moments of saturation (3) 8

UC Network (2) PC6248 and Cisco 6509 is 2x10G bonded – w&local_graph_id=3757&rra_id=all w&local_graph_id=3757&rra_id=all Last week (looks fine mostly < single 10G) 9

WAN network from UC to IU, UIUC, and BNL 10

WAN Network 11 Last week’s IO to UC: green is usually FTS transfers from BNL, blue IO mostly to IU but some to UIUC. This is for one of the 10G NICs from the 6509 to campus core (there is a second NIC to campus core, and the bonded plot, neither of which I can find in our cacti hierarchy at the moment)

Local IU 12 MWT2

IU local network Storage nodes are each connected via 10GB link to 6248 switch stack Compute nodes are connected to the same stack via 1GB connections 6248 switch has dual-10GB uplink to rtsw2, the 100GB Brocade switch 13

IU-centric WAN picture 14 This picture does not show connectivity to the other MWT2 sites; to UC, the connection is through MREN currently

IU WAN network WAN traffic last month (peaks up to 8 Gbps) 15 There are 100 Gbps links to Chicago available – though not all the way to UIUC and UC

WAN direct access versus dCache- locality mode caching We believe since turning on dCache-caching we have reduced the load on the WAN 16 dCache-locality was turned circa 8/6/2012 (~ Week 31) dCache-locality was turned circa 8/6/2012 (~ Week 31)

dCache locality mode Cache hit rate is about 75% Individual files are used an average of 4 times The average transfer transfer reads 25% of the file 17

dCache Site Caches 18 IU pools UC pools Cached data

Low efficiency at IU Slow jobs are not associated with a particular data server According to strace, most system time is spent in munmap command Jobs are slow even on a completely empty node Data is cached at IU 19

Summary and improvements Add second 10G to 8024F-6248 link at UC and bond Cable up second 10G port for uct2-s[11-14] – will need to add another 8024F, which will also require further trunking rearrangement Adding additional storage nodes will increase number of IO channels decreasing single-node contention 20