FDR readiness & testing plan R. Gardner 1/21/13
Site and redirection status All monitoring links at https://twiki.cern.ch/twiki/bin/view/Atlas/MonitoringFax Site and redirection status
Redirectors
Site reporting to monitoring Display from new UDP collector Most, but not all sites reporting Other small issues to check
WLCG transfer dashboard Data from UDP ActiveMQ. Some sites missing, some inconsistent labels. Will need to scrub starting from the UDP collector.
Testing elements At-large users HammerCloud & WAN-FDR jobs COMP L E X I T Y At-large users HammerCloud & WAN-FDR jobs (programmatic) Cost matrix (continuous) Basic dashboard functionality (continuous)
Site Metrics “Connectivity” – copy and read test matrices Snapshots per site as sever HC runs with modest job numbers Stage-in & direct read Local, nearby, far-away HC metrics Simple job efficiency Wallclock, # files, CPU %, event rate, Load tests For well functioning sites only Graduated tests 50, 100, 200 jobs vs various # files Will notify the site and/or list when these are launched
Site # client sites Local copy (sec) Regional Copy (sec, %LOC) Global copy (sec, %LOC) HC – Job eff HC – CPU% HC - WC HC - event rate (Hz) AGLT2 BNL BU CERN DESY INFN-FRASCATI INFN-NAPOLI INFN-ROMA JINR LRZ-LMU MPPMU MWT2 13 OU PRAGUE RAL PROTVINO SWT2_CPB UKI-LT2-QMUL UKI-LIV-HEP UKI-ECDF UKI-GLASGOW UKI-OX WT2
Site copy connectivity tests MWT2 as server, read from 18 ANALY queues. Accessible by 13 of the 18 queues tested Record for all sites http://ivukotic.web.cern.ch/ivukotic/WAN/index.asp
Metrics (global) No. sites, clouds participating No. ANALY queues tested as FAX-capable Average, peak aggregate rate MB/s Number of jobs from HC tests Number of jobs from WAN-FDR tests
Still need to do: Resolve pilot and wrapper issues – verify correct pilots are being sent Finish Hammer Cloud testing for direct access Placement of test datasets (files at LRZ and MWT2 only) Finalize examples for at-large users Setup metrics tables and start gathering statistics