Statistical feature extraction, calibration and numerical debugging Marian Ivanov
Outlook Motivation Requirements TTreeStream and TTReeSRedirector classes Conclusion
Reconstruction Statistical data analysis Transformation of the space of measurement to the space of physical observables Reconstruction - Iterative process Reconstruction algorithm itself TPC simplified example – raw data-> clusters ->tracks -> V0, kinks Development of the reconstruction algorithms Starting from simplified models (iteration 0) -> { Adding new information’s -> New features extracting -> New parameterizations -> } towards to MIP algorithms
Feedback Feedback in each iteration step necessary Feedback - (Working not Working) Standard tools – segmentation violation, printf, debugger, memory profiler, memory checker For statistical algorithm - not sufficient Statistical algorithm’s has to be debugged in statistical way Feedback - in multidimensional space of observables needed Decomposition and localization of the problem resp. observables (efficiency, resolution …) Better integral characteristics as a consequence, feedback iteration 0 Standard tools root - tree player, histograming package, statistical package (extended functionality needed - under development, ROOT, MI) Alice event Display (with extended functionality)
Effective development of reconstruction algorithms. Time for reconstruction algorithm development Reconstruction algorithm coding Test algorithm coding (numerical debugging) Statistical feature extraction (Calibration, alignment) algorithm coding Time consumption for all three algorithms Debugging of the test algorithms Feedback data analysis – Comparisons
Where do we spend time?(0) Time for reconstruction algorithm coding <<1% of development time Coding of other algorithms (numerical debugging, feature extraction- calibration) Implementation of loops over heterogeneous containers AliESDs, TreeKine, TreeHits, TreeTR Current default approach in AliRoot framework – copy – paste Debugging of the test algorithms
Where do we spend time?(1) Statistical analysis (tests and feature extraction – calibration) Data access - Loop over heterogeneous containers (n2, n3 problem) >99 % of time Statistical data analysis
Requirements To speed-up reconstruction algorithm development and calibration process tests and statistical feature extraction- calibration algorithms Reusable (well tested by a group of users) Standardized, supported by the framework Supporting data structures Non complicated data storage Fast (multidimensional) data access Scalable (easy to include new information, observables) Fast and universal query language over data
ROOT framework (0) ROOT framework provides classes with functionality fulfilling our requirements Data structures Non complicated data storage (TTree) Fast (multidimensional) data access TTree, TChain - optimized for sequential mode data acces (random access much slower) Scalable (easy to include new information, observables) Easy to include several branches of information, possible usage of friend TTree’s Fast and universal query language over data TTreePlayer as a powerful query language, object functionality preserved
Root framework (1) Root framework provides classes with functionality fulfilling our requirements Statistical algorithms Standardized, reusable, well tested Histogramming package Statistical package - base algorithms implemented, ongoing development Additional functionalities on top of TTree’s implemented in Alice team (efficiency and resolution calculations) Some very important robust algorithms independently implemented (1-dimensional robust spline fit, multidimensional needed, not implemented)
Numerical debugging (0) The possibilities for undetected errors- bugs in Reconstruction and Monte Carlo algorithms are numerous Complex system leads to complex calculation Errors can be made on many levels Logical understanding of the problem Typing errors in the programs Non consistent data
Numerical debugging (1) The basic principle is to output not only the number we are interested in but also as many other intermediate results as possible, especially those for which we know in advance what answer to expect. Even if we are only interested in the global average of some quantity, print out a dependence to some other interesting quantity. This generally costs little or nothing extra in big calculation, and may give considerable insight into the system being studied or allow a powerful check of correctness of computation. The quantities which we should look will depend on the problem, but general rule is to examine quantities of interest in more dimensions then is required.
TTreeStream & TTreeSRedirector (0) Streamer with basic cin streamer and TTree functionality implemented to speed up software development process Advantages: Data structures defined on the fly. Easy to include new information. TTree functionality Extensively used during development of ITS, TPF, TRD and TOF reconstruction and alignment Code committed to the CVS Examples and test functions: TTreeStrem::Test() and TTreeSRedirector::Test() function in TTreeStream.cxx
TTreeStream & TTreeSRedirector (1) Example: Create the redirector associated with file (testredirector.root) TTreeSRedirector *pmistream= new TTreeSRedirector("testredirector.root"); TTreeSRedirector &mistream = *pmistream; Create the tree with identifier specified by first argument Layout specified by sequence of arguments Tree identifier has to be specified as first argument If the tree and layout was already defined the consistency is checked If the data are consistent fill given tree the name of branch can be specified using strings with = at the the end if string is not specified use automatic convention B0, B1,...Bn mistream<<"TreeIdentifier"<<"i="<<i<<"ch=" <<ch<<"f="<<f<<"po="<<po<<"\n";
TRD real life example (0). AliTRDtracker::FindTracklet(AliTRDtrack *track) { // //algorithm.. If (DebugMode || AlignmentMode) cstream<<"tracklet"<< "track.="<<track<< // track parameters "tany="<<tany<< // tangent of the local track angle "xmean="<<xmean<< // xmean - reference x of tracklet "tilt="<<h01<< // tilt angle "nall="<<nall<< // number of foundable clusters "nfound="<<nfound<< // number of found clusters "clfound="<<clfound<< // total number of found clusters in road "mpads="<<mpads<< // mean number of pads per cluster "plane="<<plane<< // plane number "road="<<road<< // the width of the used road "graph0.="<<&graph0<< // x - y = dy for closest cluster "graph1.="<<&graph1<< // x - y = dy for second closest cluster "graphy.="<<&graphy<< // y position of the track "graphz.="<<&graphz<< // z position of the track "fCl.="<<&array0<< // closest cluster "fCl2.="<<&array1<< // second closest cluster //…….// "angle0="<<angle[0]<< // angle deviation in the iteration number 0 "sangle0="<<sangle[0]<< // sigma of angular deviation in iteration number 0 "angleb="<<angle[bestiter]<< // angle deviation in the best iteration "sangleb="<<sangle[bestiter]<< // sigma of angle deviation in the best iteration // "expectederr="<<expectederr<< // expected error of cluster position "\n"; }
TRD real life example - Analysis (1)..L AliGenInfo.C+.L AliESDComparisonMI.C+.L AliTRDComparison.C+ MakeTree(); Connect MC information with information retrieved during reconstruction (If MC information available) MakeComparison Comp.DrawPoolsY(MC cuts, quality cuts)
TOF real life example (0) Float_t AliTOFtracker::GetLinearDistances(AliTOFtrack * track, AliTOFcluster *cluster, Float_t distances[5]) //algorithm.. If (DebugMode || AlignmnetMode){ cstream<<"Tracks"<< "TOF.="<<track<< "Cx="<<cpos0[0]<< "Cy="<<cpos0[1]<< "Cz="<<cpos0[2]<< "Dist="<<k<< "Dist0="<<distances[0]<< "Dist1="<<distances[1]<< "Dist2="<<distances[2]<< "TDC="<<tdc<< "\n"; }
TRD and TOF real life example To get access to the same information using standard schema Define data structure (~ 1000 lines of code) and place it somewhere Hundreds of data classes needed Global Trees has to be defined at some moment – to make it possible to access them in member function code SetBranchAddress necessary No problems with AliRoot Code Checker and Smell checker Access to rough data is, according to Smell checker, very suspicious (according to the Smell checker- bad design)
Conclusion To speed up and improve reconstruction algorithm standardized tools for numerical debugging and feature extracting have to be implemented as integral part of the framework TTreeStream and TTreeSRedirector implemented as a first attempt of some standardization