Experience in ALICE – Analysis Framework and Train Andreas Morsch CERN
Analysis in ALICE Three main analysis modes Prompt data processing (calibration, alignment, reconstruction, analysis) @CERN with PROOF Analysis with local PROOF clusters Batch Analysis on the GRID infrastructure
Plans for Scheduled Analysis Most efficient way for many analysis tasks to read and process the full data set. In particular if resources are sparse. Optimise CPU/IO ratio But also Helps to develop a common well tested framework for analysis. Develops common knowledge base and terminology. Helps documenting the analysis procedure and makes results reproducible.
Plans for scheduled analysis: Analysis train producing AODs Acceptance and Efficiency Correction Services Monte Carlo Truth ESD/AOD TASK 1 TASK 2 TASK … TASK N AOD
What the Analysis Framework does in ALICE Transparent access to all resources with the same code Usage: Local, AliEn grid, CAF/PROOF Transparent access to different inputs ESD, AOD, Kinematics tree (MC truth) Allow for „scheduled“ analysis Common and well tested environment to run several tasks
Solutions Transparent access to computing resources Hide computing scheme dependent code in one Manager Class Transparent access to data Make intensive use of interfaces VEventHandler VEvent VTrack
AliAnalysis… Framework Data-oriented model composed of independent tasks Task execution triggered by data readiness Tasks are owned and managed by AliAnalysisManager Parallel execution and event loop done via TSelector functionality Mandatory for usage with PROOF AliAnalysisTask INPUT 0 INPUT 1 OUTPUT 0 CONT 0 CONT 1 CONT 2 N.B.: The analysis framework itself has a very general design, not bound to ALICE software A. Gheata
AliAnalysisManager and PROOF: The Manager in Disguise TSelector Delegates Creates AliAnalysisSelector AliAnalysisManager
AliAnalysisManager – PROOF mode CLIENT PROOF AliAnalysisSelector TSelector AM->StartAnalysis(“proof”) MyAnalysis.C Master O1 Analysis Manager task1 task2 task3 taskN Input chain Outputs Input list AM Output list Worker Worker SlaveBegin() AM task1 task2 task3 taskN Inputs Outputs Worker Process() Worker AM Worker SlaveTerminate() O2 O1 On Terminate() O2 On O
TObjArray *fContainers Tasks and event loop AliAnalysisManager TObjArray *fContainers TObjArray *fTasks AliAnalysisSelector Chain->Process() EVENT LOOP Top cont ESD chain Top level tasks and containers (“Train”) task1 task2 output1 output2 POST EVENT LOOP Task Fit task4 result result
(AliAODInputHandler) Tasks and Common I/O AliAnalysisManager AliAODHandler (Output) AliAODEvent AliMCEventHandler AliVEventHandler AliMCEvent AliAnalysisTask AliMCParticle AliAODtrack AliESDEvent (AliAODEvent) AliESDtrack AliESDInputHandler (AliAODInputHandler) Tasks AliVParticle AliVEvent Data I/O via slots
AliAnalysisTask ConnectInputData() CreateOutPutObjects() Define which data is connected to which slot CreateOutPutObjects() Create Histograms Init(),LocalInit() Optional, e.g. read parameters Exec() The event loop Terminate() Called at the end, can draw e.g. a histogram
Common ESD Access Handling AliAnalysisManager AliVirtualEventHandler AliESDInputHandler AliESDEvent AliESDInputHandler* inpHandler = new AliESDInputHandler(); inpHandler->SetInactiveBranches(“ Calo FMD “); AliAnalysisManager *mgr = new AliAnalysisManager(‘Analysis Train’, ‘Test’); mgr->SetInputEventHandler(inpHandler);
Common ESD Access Handling void AliAnalysisTaskXYZ::ConnectInputData(Option_t* option) { // Connect the input data AliESDInputHandler* esdH = (AliAODHandler*) ((AliAnalysisManager::GetAnalysisManager()) ->GetInputEventHandler()); fESD = esdH->GetEvent(); }
Common AOD Access Handling AliAnalysisManager AliVirtualEventHandler AliAODHandler AliAODEvent AliAODHandler* aodHandler = new AliAODHandler(); aodHandler->SetOutputFileName("aod.root"); AliAnalysisManager *mgr = new AliAnalysisManager(‘Analysis Train’, ‘Test’); mgr->SetOutputEventHandler(aodHandler); AliAnalysisDataContainer *coutput1 = mgr->CreateContainer(‘AODTree’, TTree::Class(), AliAnalysisManager::kOutputContainer, "default");
User Analysis Code: Output Data void AliAnalysisTaskXYZ::CreateOutputObjects() { // Create the output container // // Default AOD AliAODHandler* handler = (AliAODHandler*) ((AliAnalysisManager::GetAnalysisManager()) ->GetOutputEventHandler()); fAOD = handler->GetAOD(); }
AliVirtualEventHandler AliAnalysisManager AliVirtualEventHandler AliVEvent AliMCEventHandler AliMCEvent AliMCEventHandler* mcHandler = new AliMCEventHandler(); AliAnalysisManager *mgr = new AliAnalysisManager(‘Analysis Train’, ‘Test’); mgr->SetMCtruthEventHandler(mcHandler);
User Analysis Code: MC truth void AliAnalysisTaskXYZ::Exec(Option_t* option ) { // During Analysis AliMCEvent* mc = mgr->GetMCEventHandler()->MCEvent(); Int_t ntrack = mc->GetNumberOfTracks(); for (Int_t i = 0; i < ntrack; i++) AliVParticle* particle = mc->GetTrack(i); Double_t pt = particle->Pt(); }
(Recommended) Integration of User Analysis Code AliAnalysisTask Steers Delegates AliAnalysisUserTask User AnalysisCode Implements Interface Deals with AliAODEvent Documents selection and analysis parameters Factory
Example: Gamma Analysis Task AliAnaGamma AliAnaGammaDirect AliAnaGammaCorrelation AliGammaReader AliGammaDataReader AliGammaMCReader AliGammaMCDataReader AliAnaGammaParton AliAnaGammaHadron AliAnaGammaJetLeadCone AliAnaGammaJetFinder AliAnalysisGammaTask
CAF Related Issues We produce large output trees which are currently memory resident on the worker side. Need urgently the TProofFile/TFileMeger mechanism to handle file resident trees. Future challenges Event mixing with nested event loops Repeating loops (calibration, s. M. Ivanov)
Integration of User Tasks Relatively smooth so far Needs user support to scrutinize (in particular for CAF/PROOF): Memory requirements (leaks) Correct data member initialization On client and workers
Analysis train producing AOD Tested LOCAL, on GRID and PROOF modes on p-p and Pb-Pb events Monte Carlo Truth ESD/AOD ESD FILTERING JET ANALYSIS GAMMA TASK Others to come … AOD
Summary ALICE Offline has developed an analysis framework that hides computing scheme dependences from the user. The same user code runs on Local PC CAF/PROOF Grid Framework manages a list of independent tasks: Execution triggered by data readiness Sequential execution of the top level task (train) driven by input chain Common I/O is managed by event handlers Run-time configuration of Tasks and Handlers
Thanks to … A. Gheata M. Gheata J.-F. Grosse-Oetringhaus Ch, Klein-Boesing M. Oldenburg F. Carminati Y. Schutz G. Conesa and many others