Presentation is loading. Please wait.

Presentation is loading. Please wait.

QA train tests M. Gheata. Known problems QA tasks create too many histograms – Pushing resident memory limit above 3GB – Train gets kicked out by some.

Similar presentations


Presentation on theme: "QA train tests M. Gheata. Known problems QA tasks create too many histograms – Pushing resident memory limit above 3GB – Train gets kicked out by some."— Presentation transcript:

1 QA train tests M. Gheata

2 Known problems QA tasks create too many histograms – Pushing resident memory limit above 3GB – Train gets kicked out by some sites – See next slides Final merging of results was affected by xml+zlib bug – Even after several resubmissions results were not available for some runs – No OCDB access in merging Merging is heavy for runs with too many chunks – Memory in the last merging step gets too big – Process only 10% of chunks Some new trigger classes used recently (CINT7, EMC7, …) – QA train was set for kMB, now we need separate trains to select these classes – Not clear how to automatically run the appropriate train for a given run

3 Testing QA wagons Run a single script on a test machine – Testing each QA wagon independently on a local dataset – Producing syswatch.root (or not, if the task is crashing) Run a macro that plots the memory profiles extracted by the analysis manager – Doing a linear fit in a specified range (stable regions) and extracting memory value + leak – Subtracting the baseline given by EventSelection task + OCDB connect Some jumps in the memory profile seems related to the usage of friends and need to be understood more

4 Current version Versions tested on pcaliense05 – Root v5-28-00d – AliRoot v4-21-25-AN-1 Run tested: 152371 from LHC11c Wagons that did not finished the test: – ZDC (segfault) 0x00007f983b12429c in AliAnalysisTaskZDC::UserExec(char const*) at /home/aliroot/aliroot/v4-21-25- AN/PWG1/ZDC/AliAnalysisTaskZDC.cxx:326

5 Resident memory for the train Train run on 50 local files (~ twice the size of QA jobs) Some leaks make the memory go above 3 GB

6 Resident memory (1) BASELINE (PhysSel + CDBconnect) SDDdEdx SDDTPC (!)SPD VERTEXVZEROQAsym

7 Resident memory (2) TRD Jump – due to friends (?) ITSITSsaTracksITSalign CALOMUONtrigImpParRes MUON tracking (to be redone – typo in my macro)

8 Resident memory (3) TOF HMPIDT0 ZDC segfault

9 Train global plot Fit parameters (p0 + p1*event) for all tasks in a single plot Baseline p0 value subtracted, but not the same for all tasks (those using friends) p1 (slope) plotted as error bars per 100K events (2x what the train usually processes per job) Some leaks visible, at a level that does not affect the job (except TPC - more than 200 MB per 100K events) MB

10 Performance checks CPU time I/O Local dataset on a test machine, processing took 2 days for ~100K events valgrind –tool=callgrind aliroot –b –q QA.C 61.5 % CPU, 38.5 % I/O (ratio is different if accessing remote files)

11 CPU performance per task ITS ImpParRes 28.2% ITS VertexESD 10.1% QASym 5.7% Friends 10.4% ITS align 5.6% ITS tracking 3.5% ITS tasks use ~50% of the CPU time


Download ppt "QA train tests M. Gheata. Known problems QA tasks create too many histograms – Pushing resident memory limit above 3GB – Train gets kicked out by some."

Similar presentations


Ads by Google