WP12 - General Development News Sandro Wenzel
Software quality / development workflow ROOT macros ROOT macros and compiled libraries go quickly out of sync and could cause defect code Did a campaign to systematically look if macros in O2 still execute fine with ROOT Fixed many macros which did not run (see https://github.com/AliceO2Group/AliceO2/issues/594) Few macros could not be fixed and were renamed to “.C_backup”; (please check if they are still needed) from now on the pull request checker will automatically check all our macros and flag error when there is a problem Makes keeping a good code state a lot easier
Arbitrary Types with FairRoot/IO Status quo So far, only classes of type TNamed or TCollections could be exchanged with the FairRootManager for persistent IO and exchange of data between tasks Registering outgoing data TClonesArray *mHits; mHits = new TClonesArray("tpc::Hit"); iomgr->Register("TPCHit", mHits, true); mHits = dynamic_cast<TClonesArray*>(iomgr->GetObject("TPCHit")); Asking for incoming data Memory overhead due to Data elements needs to be TObject (~16bytes overhead) Data elements are stored as pointers (8bytes overhead) Type unsafe code TClonesArray can’t do type checking at compile time
Arbitrary Types with FairRoot/IO New feature A pull request has been made to FairRoot which extends the FairRootManager to handle branches of any type for the purpose of IO and data exchange Store hits in stl containers Registering typed outgoing data std::vector<tpc::Hit> *mHits; iomgr->RegisterAs("TPCHit", mHits, true); mHits = iomgr->InitObjectAs<std::vector<tpc::Hit>*>("TPCHit"); Asking for typed incoming data No negative effect on split level of persistent branches Feedback/comments on PR welcome Once PR accepted, I would like to move completely away from TClonesArrays in favour of std::vectors
Lazy loading of branches with FairRoot analysis runs Simulation writes hits for all detectors into same tree (different branches) Can lead to very large trees/files quickly Example: 4 PbPb hijing events with ~15000 primaries each lead to ~1.7GB on disc When processing these hits (digitization) the standard FairRunAna class will load every single branch for a given event into memory Causes large memory buffers to be allocated (~3GB for above example) Might be substantial overhead in cases where we do not consume all these data, for example: Digitize only ITS hits Digitize only one TPC sector
Lazy loading of branches with FairRoot analysis runs Proposing an addition to FairRootManager (PR636) that allows to keep track of which processing task needs which data + made a prototype modification to FairRunAna (O2RunAna) Now able to load only branches actually needed for processing; May lead to large gain in memory and CPU time: tpc-run-sim -m digi -n 4 —-dSector 2 # digitize only sector 2 MaxMemory: 3742.56MB CPU time 21.08s MaxMemory: 789.52MB CPU time 7.66s Definitely useful for rapid prototypic, profiling algorithms A step in organizing data and processing in direction of efficient parallel device-based processing
MC labels Discussed some time ago the issue of MC-labels TPC has restructured their digitization to make use of external MC labels Allows to have arbitrarily many labels per data element Labels are stored in a separate branch and no more distinction between DigitMC and DigitRaw Will report in one of the next meetings… For now interested developers could take a look here: PR #578: Shows how the final label branch is filled PR #586: Generalization allowing to fill labels in random data order Feedback welcome!