CHEP031 Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid Jinghua Liu for Pablo Yepes, Jinghua Liu Rice University, Houston, TX Maarten Ballintijn,

1 CHEP031 Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid Jinghua Liu for Pablo Yepes, Jinghua Liu Rice University, Houston, TX Maarten Ballintijn, Gunther Roland, Bolek Wyslouch, Jinlong Zhang MIT, Cambridge, MA Supported by NSF grants #0218603, #0219063

2 CHEP032 Outline From data analysis user’s point of view Why: ROOT/PROOF/Grid How: Step by Step What: Test Result Summary Other PROOF talks in this conference: Fons Rademakers Maarten Ballintijn

3 CHEP033 ROOT/PROOF ROOT as a data analysis tool PROOF: Parallel ROOT Facility,based on and part of ROOT on clusters of heterogeneous machines parallel analysis of objects in a set of files parallel execution of scripts Transparency, Scalability, Adaptability, Error handling, Authentication “Bring the KB to the PB not the PB to the KB” KB: code-->CPU, PB: data Use distributed CPUs to analyze distributed data

4 CHEP034 PROOF/Grid Interface Use a Grid Resource Broker to detect which nodes in a cluster can be used in the parallel session Use Grid File Catalogue and Replication Manager Utilize Grid Monitoring Services Support Globus Authentication Abstract Grid interface

5 CHEP035 Step by Step Setup PC cluster(s) (for PROOF/Grid) Prepare the data files Write analysis code (algorithm) Compile a data set for PROOF Run a PROOF job Get the results

6 CHEP036 PC Clusters Client machine (desktop) P4 @ 1.8GHz /512MB/40GB Cluster1: 2 Dual Xeon @ 2.4GHz /1GB/360GB 1 Dual Athlon @ 1.73GHz /1GB/240GB 8 Dual PIII @ 400MHz /512MB/60GB Cluster 2: 3 Dual Athlon @ 1.67GHz /2GB/200GB Operating systems: RedHat 6.1, RedHat 7.3, Slackware 8.1 Globus version: 2.2

7 CHEP037 CMS Heavy Ion Simulation Jet & high-p T particle angular correlation Use Calorimeters only

8 CHEP038 CMS Heavy Ion Simulation Pythia (event generator): 10,000 jet events Hijing (Heavy Ion event generator): 1000 events Each Hijing event (dN/dy~5000) was divided into ~500 sub-events Randomly re-combine 500 sub-events (from different events) to form a new Hijing event, a cheap way to obtain more Monte Carlo events CMSIM (GEANT 3 based simulation program for CMS)

9 CHEP039 Data Production: Globus Jobs Globus Gate Keeper (PBS) Work node Globus Gate Keeper (Condor) Work node Client PC Globus used to submit & manage the jobs No data replication (files were intentionally stored locally)

10 CHEP0310 Build ROOT Tree Superimpose jet events on top of Hijing events and generate ROOT Tree Standalone code linked with ROOT libraries CMS: Ecal (Electromagnetic Calorimeter): barrel 61200 cells, endcap 14648 cells HCal (Hadronic Calorimeter): 14616 cells (multi-layer) 4032 towers calotree--Ecal cells (energy, position) Hcal towers (energy, position) 10,000 events were split into 100 files, 100 events each, file size ~160MB, total data 16GB Data distributed, each node got some local files

11 CHEP0311 TSelector – The Algorithms Create TSelector from TTree $ root root[0] TFile f(“heavyion001.root”) root[1] calotree->MakeSelector(“myselector”) root[2].q $ ls myselector.C myselector.h Add the analysis code (algorithm) into TSelector $ vi myselector.h $ vi myselector.C

12 CHEP0312 TSelector – The Algorithms myselector.h Class myselector : public TSelector { public: TTree *fChain;. private: TH1F *hist1d; TH2F *hist2d;. }

13 CHEP0313 TSelector – The Algorithms myselector.C void myselector::Begin(TTree *tree) { hist1d = new TH1F(“DeltaPhi”,”DeltaPhi”,100,180.,180.); Hist2d = new TH2F(“EtaPhi”,”EtaPhi”,100,-5.,5.,100,-4.,4.); fOutput->Add(hist1d); fOutput->Add(hist2d); } Bool_t myselector::Process(Int_t entry) { user’s analysis code goes here! for(i=0; i< nclusters; i++) { if (Et1>5) for(j=i+1; j< nclusters; j++) { if(Et2>5) { DeltaPhi= … hist1d->Fill(DeltaPhi); }

14 CHEP0314 TDSet – Data Location Specify a collection of TTrees or files [] TDSet *ds = new TDSet(“TTree”, “calotree”); [] ds->Add(“/data1/cms/cmsim/heavyion001.root”); [] ds->Add(“/data1/cms/cmsim/heavyion002.root”); … [] ds->Add(“lfn://”); [] ds->Add(“lfn://”); … [] ds->Print(); Returned by DB or File Catalog query etc It’s better to put these into a macro

15 CHEP0315 Running a PROOF Job $ root [] gROOT->Proof(“”); [] TDSet *ds = new TDSet(“TTree”, “calotree”); [] ds->Add(“...”);... [] ds->Process(“myselector.C+”, “options”, nentries, first); (note: options must be pre-coded in myselector.C) [] TH1F *h1=(TH1F *)gProof->GetOutput(“DeltaPhi”); [] h1->Draw();

16 CHEP0316 Angular Correlation

17 CHEP0317 Scale plot Analysis speed vs. CPUs (PIII 1GHz equivalent) CPU power/data size balanced CPU intensive calculations

18 CHEP0318 Summary CMS Heavy Ion Analysis implemented and tested with PROOF Scales well with CPUs PROOF/Grid can provide the data analysis power unavailable otherwise. This power can be achieved without much extra effort PROOF/Grid interface is under rapid development. The plan is to extend the presented study to use Grid interface

