Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
Analysis Trains - Jan Fiete Grosse-Oetringhaus2 LEGO Trains 42 trains configured (37 active) –5 CF, 4 GA, 1 PP, 8 JE, 5 DQ, 11 HF, 8 LF Submitted trains this year –213 CF, 35 DQ, 24 GA, 124 HF, 173 JE, 114 LF, 3 PP 1-5 train operators / train Operator mailing list TWiki page viewauth/ALICE/AnalysisTrains PWGJobs [k] Wall in years CF ,8 DQ ,1 GA852140,1 HF ,4 JE ,4 LF ,5 PP110,2 since on average 2400 jobs at any given time
Analysis Trains - Jan Fiete Grosse-Oetringhaus3 Running Statistics alidaq aliprod alitrain SUM
Analysis Trains - Jan Fiete Grosse-Oetringhaus4 Time until trains finish Time between train submission and submission of final merging job Average below 2 days (good!) but quite some spread Average per month per Train
Analysis Trains - Jan Fiete Grosse-Oetringhaus5 AliEn Upgrade The upgrade this Monday of parts to v2-20 had a few side-effects –General interruption from to midnight; during this period Costin & Pablo were continuously working on fixing the situation –Jobs (in particular) merging that got submitted during that time failed, and needed to be retried later Mistake, LPM should have been disabled for the upgrade –New status FAILED which is not considered as a final state lead to some delay for merging jobs, fixed today (parallel failure of CERN EOS makes submission very slow) –Bug in SE selection, some jobs go to FAILED being fixed by Pablo at present I propose that planned upgrades are evaluated in particular with respect to the analysis trains and a plan is made how to recover failures from/during the period
Analysis Trains - Jan Fiete Grosse-Oetringhaus6 Planned Improvements
Analysis Trains - Jan Fiete Grosse-Oetringhaus7 Improve Merging Merging –Dedicated CE/SE for merging (at CERN) being investigated –Merging job submission to be speeded up (at the moment dependent on number of waiting analysis jobs) Job Splitting –Investigate new AliEn option to select the input files once the job has started increases number of files per job (less merging, more files for event mixing)
Analysis Trains - Jan Fiete Grosse-Oetringhaus8 Train Statistics Add consumed CPU and wall time for total and per job in run view 2.2y CPU total 3.2y Wall total 3.2h CPU / job 4.2h wall / job 4.7 files / job
Analysis Trains - Jan Fiete Grosse-Oetringhaus9 Dataset Selection Allow users on the interface to indicate on which dataset they would like to run –Operator marks dataset as "active" (similar to wagons) –User selects the desired datasets among those LHC10h_AOD086 LHC11h_AOD095 … Desired datasets
Analysis Trains - Jan Fiete Grosse-Oetringhaus10 Merging Test Test also the merging per wagon Merging test OK Failed
Analysis Trains - Jan Fiete Grosse-Oetringhaus11 Further Ideas Number of wagons Enabling/disabling by lists (of wagon numbers / names?) Saving / loading of train configurations Groups of wagons Ordering of wagons
Analysis Trains - Jan Fiete Grosse-Oetringhaus12 Demo …some new features…
Analysis Trains - Jan Fiete Grosse-Oetringhaus13 Summary The LEGO train system got very popular The average finishing time of a train is 2 days, but with quite some spread We have lots of improvements requests and ideas We have a lack of manpower (there is only Costin and me, both with many other tasks, too) which leads sometimes to large response times