Download presentation
Presentation is loading. Please wait.
Published byHollie Hood Modified over 8 years ago
1
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Feedback to sites from the VO auger Jiří Chudoba (Institute of Physics and CESNET) with input from the Auger production team (J.Lozano Bahilo, G.Rubio, M.D.Serrano - UGR) and Jean-Noel Albert (LAL)
2
www.egi.eu EGI-InSPIRE RI-261323 The Observatory PAO is an astroparticle project to measure ultra–high energy cosmic rays See my talk on Friday for more details about the project 10.4.2012 Jiri.Chudoba@cern.ch2
3
www.egi.eu EGI-InSPIRE RI-261323 VO auger Mostly used for organized production of simulations of cosmic ray showers and detector response CORSIKA with different models - FORTRAN Offline code – C++, but many packages included (GEANT4, ROOT) 10.4.2012 Jiri.Chudoba@cern.ch3
4
www.egi.eu EGI-InSPIRE RI-261323 Sites supporting VO auger 10.4.2012 Jiri.Chudoba@cern.ch4 23 sites 10 countries How shall we acknowledge sites contribution?
5
www.egi.eu EGI-InSPIRE RI-261323 Some issues feedback from sites change of VOMS server certificate too many jobs in queue hanging lcg-cp SE occupancy, data movement slow LFC response efficiency evaluation 10.4.2012 Jiri.Chudoba@cern.ch5
6
www.egi.eu EGI-InSPIRE RI-261323 Feedback from sites Production, VO management, data management, bulk transfers to SRB – done by geographically distributed team Sites should preferably handle all issues via GGUS We may not know about some problems sometimes we learn them only from sites we manage 10.4.2012 Jiri.Chudoba@cern.ch6
7
www.egi.eu EGI-InSPIRE RI-261323 Change of the VOMS certificate Change of the DN sites must download the new certificate from the CIC portal and reconfigure services broadcast message shall we create a GGUS ticket for each site? we did not succeed with the right configuration on our site at first attempt Can production continue? running jobs with proxy signed by the “old” VOMS server solution could be using two VOMS servers? 10.4.2012 Jiri.Chudoba@cern.ch7
8
www.egi.eu EGI-InSPIRE RI-261323 Too many waiting jobs Some sites reported too many (thousands) of waiting jobs in the auger queue The distribution is done by WMS servers, we do not send directly to sites wrong values in the BDII ? slow update? bug in WMS? We decreased the parameter submitted/running 10.4.2012 Jiri.Chudoba@cern.ch8
9
www.egi.eu EGI-InSPIRE RI-261323 Hanging jobs CORSIKA in infinite loop only a small fraction of jobs difficult to debug cpu is used, but there is no update of output files fixed by CORSIKA developers 10.4.2012 Jiri.Chudoba@cern.ch9
10
www.egi.eu EGI-InSPIRE RI-261323 Hanging jobs II lcg-cp used to download sw if not locally available It hanged in some cases very “expensive” error – jobslot blocked until job is killed on the walltime limit GGUS #90936 Jiri Horky debugged it, Michail Salichos provided a patch a lot of work, took more than 2 months should be fixed in the next release 10.4.2012 Jiri.Chudoba@cern.ch10
11
www.egi.eu EGI-InSPIRE RI-261323 SE Occupancy Production stores results on available SEs some sites excluded Can fill all available space Space tokens should be used to set quotas – AUGERPROD, limit write access to the production role We are unable to quickly response to requests to move TBs of data from a site there is not enough space on other sites 10.4.2012 Jiri.Chudoba@cern.ch11
12
www.egi.eu EGI-InSPIRE RI-261323 Data transfers to SRB Decommissioning of an SE with many auger files 10.4.2012 Jiri.Chudoba@cern.ch12 FTS transfers from Lille to Lyon 2 months, 1.9 M files, 38.7 TB less than 1% of lost files 31 750 operations/day, 1300 ops/hour 650 GB/day, 27 GB/hour, 8 MB/s FTS transfers from Bordeaux to Lyon 1 month, 700 K files, 7.1 TB.6% of lost files 12200 operations/day, 500 ops/hour 160 GB/day, 7 GB/hour, 2 MB/s Many more small files in Bordeaux Large files stored to tapes in Lyon
13
www.egi.eu EGI-InSPIRE RI-261323 Effectiveness evaluation Efficiency: cputime/walltime 10.4.2012 Jiri.Chudoba@cern.ch13
14
www.egi.eu EGI-InSPIRE RI-261323 Top ten VOs efficiency 10.4.2012 Jiri.Chudoba@cern.ch14 Efficiency of the biggest VOs for 2012-01 to 2012-12
15
www.egi.eu EGI-InSPIRE RI-261323 VO auger efficiency 10.4.2012 Jiri.Chudoba@cern.ch15 From 2012-01 to 2013-03 - efficiency improves
16
www.egi.eu EGI-InSPIRE RI-261323 Effectiveness evaluation Effectiveness = 10.4.2012 Jiri.Chudoba@cern.ch16 cputime of jobs with good output total walltime Difficult to estimate No information about cancelled or lost jobs Some jobs without job log file stored correct results Production maximizes throughput Each job processes 1 shower 5 times Jobs resent if not enough (<3) output files More detailed view from accounting portal could help Just one of many possible definitions
17
www.egi.eu EGI-InSPIRE RI-261323 Instead of conclusions We thank all sites supporting the VO auger for their hardware resources and manpower support 10.4.2012 Jiri.Chudoba@cern.ch17
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.