Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 CMS Software Physics Analysis in a Brave New Woorld Vincenzo Innocente (original form Stephan Wynhoff) Introduction for Summer Students, 13 th July 2004
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Overview Not a talk to describe why things are the way they are 1. The main projects and tools 2. The role of the central framework, the persistent store 3. A glance on how to use it COBRA IGUANA ORCAFAMOS LCG/AA OSCAROVALSCRAM IGUANACMS IGNOMINY
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Challenges: Geographical Spread 1700 Physicists 150 Institutes 32 Countries (and growing!) CERN state 55 % NMS 45 % Major challenges associated with: Communication and collaboration at a distance Distributed computing resources Remote software development and physics analysis
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Tier 1 Tier2 Center Online System Offline Farm, CERN Computer Ctr > 20 TIPS FNAL Center IN2P3 Center INFN Center RAL Center Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec ~2.5 Gbits/sec Mbits/sec Physics data cache ~Pbyte/sec ~2.5 Gbits/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment Software to ● keep the systems running ● distribute data & jobs ● simulation, reconstruction, analysis CMS Computing Model
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 (Some of the) Major Challenges for LHC SW ● Events are big (raw event is 1MB, with Monte-Carlo 2MB) ● Detector digitization has to take into account multiple crossings ● 34 = 17 minimum bias events/crossing ● Calorimetry needs -5 to +3 crossings ● Muon DT ought to have crossings ● Tracker loopers can persist for many crossings ● Typically need info from ~ 200 mb events per signal event ● Study at different luminosities infers different pileup ● include pileup in digitization (front end of reconstruction) ● Track finding in very complex environment ● High magnetic field and ~ 1 rad length of tracker material: ● Lots of bremsstrahlung for the electrons, ● TK-ECAL matching non-trivial
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 H ZZ ee event with M H = 300 GeV cm -2 s cm -2 s cm -2 s cm -2 s -1
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Pileup ● For 1 million Signal events need 200 million min bias events ● Impossible with current CPU, storage etc ● Solution is to sample a finite number pseudo-randomly ● Problems can come when one single mb event by itself would trigger the detector ● You would get this trigger many many times ● Filter the minimum bias events, but remember to take into account the removed events ● must sample from full range of mb events to ensure patterns do not repeat too often ● 200mb events = 70MB ● massive data movement problem ● Pileup is CPU intensive
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Simulation, Reconstruction, Analysis Monte-Carlo generator Ntuple Zebra FZ CMSIM CMSjet User Analysis Monte-Carlo generator OODB OSCAR ORCA User Analysis FAMOS
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 OSCAR HEPEVT Ntuple ORCA OODB minbias OODB signal OODB Digis SimReaderRecReader MC generator CMKIN Production User The Analysis Chain Generation Simulation Digitization Reconstruction OODB SimHits Analysis OODB RecObs (DST) Histo
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Simulation ● Understand detector response ● optimise geometry ● set-up, test alignment ● understand signal and background topologies ● feasibility studies of analyses ● optimise trigger ● optimise physics selections ● Mainly FORTRAN (some non-standard) ● use GEANT3, Zebra ● (almost) out of use ● OO design ● use GEANT4 ● integrated with CMS software (COBRA) Tasks CMSIM OSCAR
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 OSCAR in 2004 Official Simulation Production Program ● Create and write to OO database and read with ORCA ● User Interface (.orcarc) compatible with ORCA ● Performance: ● Cuts per volume with same values as in CMSIM ● Magnetic field tracking tuned Two times slower than CMSIM (133) Released version Capabale of simulating several million events.
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 OSCAR Information ● Albert De Roeck et al. ● ● ● Meetings: SPROM = Simulation PROject Management every 2 weeks Monday 16:30h in 40-2A-01
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Detector Description DataBase
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 DDD Information ● Michael Case ● ●
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Geometry ● Collects XML description of the CMS detector ● Materials ● Positions ● (cuts for OSCAR) ● Contact: Pedro Arce ● CMSIM 133 Geometry 182
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Reconstruction ● Combination of Signal & Pile-up events ● Detector digitisation ● (this is the stuff that comes out of the detector) ● Reconstruction of detector and simple objects ● Tracks, Clusters, Vertices ● Reconstruction of physics objects ● Jets, Electrons, Photons, Muons ● Simulation of L1 Trigger decisions ● The Higher Level Trigger algorithms
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Digitisation and Pileup ● High luminosity -> 17 minimum bias events in one bunch crossing ● Overlay crossings -5 to +3 ● 200 min.bias for 1 signal event ● "recycle" min.bias events
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Stages of Reconstruction SimHits Produced by Geant, stored in DB Digis Include Pileup, some stored in DB (Tk) RecHits Pre-processed digits, some stored in DB (Calo) RecObj Tracks,Clusters etc, some stored in DB 4vectors Produced by MC stored in Ntuple
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 ORCA ● the flagship OO software project ● started in Sept ● currently in eighth major release (8.2.0) ● widely used by physicists ● HLT studies ● DAQ TDR ● Physics TDR ● DC04 ● not everything can be stored in the database ● Storing with POOL root-streaming ● a prototype is not a final product
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 ORCA Applications Fully embedded in COBRA framework, currently: 1. G3Reader (Also called ooHits) 1. Read cmsim fz files 1. Write cmsim hits to Db 2. SimReader (also called ooDigi) 1. Pileup arbitrary numbers of crossings pseudo-randomly 2. Full digitization 1. persistent storage of results 3. RecReader _ Read and write Digis and RecObj (DST) 1. Calorimetric clustering 2. Jet finding (with any types of objects) 3. Muon Segment and track finding 4. Track finding 5. Primary vertices
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 ORCA Project Structure
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 ORCA Information ● Stephan Wynhoff et al. ● ● ● Meetings: RPROM = Reconstruction PROject Management every 2 weeks Monday 16:30h in 40-2A-01
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 COBRA CMS Analysis and Reconstruction Framework ● Glue it all together ● Insulate user code from services ● Manage persistent data transparently ● user code does not see any ROOT/IO related code ● Manage Collections, Runs etc ● Manage the order of reconstruction ● Ensure a uniform interface to all CMS code
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Federation wizards Detector/Event Display Data Browser Analysis job wizards Generic analysis Tools ORCA FAMOS LCG tools GRID OSCAR COBRA Distributed Data Store & Computing Infrastructure CMStools Architecture Overview Consistent User Interface Coherent set of basic tools and mechanisms Software development and installation
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 COBRA Components ODBMS (POOL) Geant3/4 SEAL CLHEP ROOT C++ standard library Extension toolkit Reconstruction Algorithms Data Monitoring Event Filter Physics Analysis Calibration Objects Event Objects Generic Application Framework Physics modules Utility Toolkit Specific Framework CMS adapters and extensions Configuration Objects Geometry Objects GRID tools
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Persistent Store ● CMS data taking, Simulation, Selection, Calibration, Analysis etc modules need to communicate with each other via a shared event store ● The persistent store – use POOL (from LCG) ● We try to use a single store for everything ● In the past, very few people actually made persistent objects ● zebra banks for example ● We hope to give that ability to “everyone” ● It is a bit complicated :-) ● Most people used existing persistent objects, analyzed them and then worked via an Ntuple to do their final analysis ● But how many times have you been in the middle of doing that when your Ntuple became invalidated by a new calibration? ● How much nicer would it be to have full access to the event from your Ntuple so you could quickly apply a new calibration?
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 COBRA Information ● Vincenzo Innocente et al. ● ●
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 IGUANA ● browse federations ● display stored histograms ● event display
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 OSCAR Visualization Interactive Geant4 3D CMS detector geometry: physical volume tree; Interactive overlap detection: find overlaps and show result details in list.
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 ORCA Visualization Interactive 3D CMS detector geometry (Geant3) for sensitive volumes with levels of details; Interactive 3D reps of reconstructed and simulated events including visualisation of physics quantities such as tangent of a simhit; Access event by event or automatically fetching events (no batch mode); Event and run number displayed. Interactive 3D CMS detector geometry (Geant3) for sensitive volumes with levels of details; Interactive 3D reps of reconstructed and simulated events including visualisation of physics quantities such as tangent of a simhit; Access event by event or automatically fetching events (no batch mode); Event and run number displayed. Multiple (cloned) views; Slices and cuts; Printout for selected object; Zoom and search; Context help; Viewpoints.
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 IGUANA Information ● Lucas Taylor, Ianna Osborne, Lassi Tuura ● ●
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Fast MonteCarlo Simulation ● FAMOS for OSCAR ● less geometry volumes ● less detailed GEANT4 options ● FAMOS for ORCA ● faster algorithms ● FAMOS for ORCA ● simple parametrisation of resolutions & efficiencies ● tuned to full simulation/reconstruction Working for all sub-detectors, soon also for HTL MC 4-vector SimHit RecHit / Digi Analysis obj sec 1-10 sec sec
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 FAMOS Information ● Patrick Janot ● ●
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Project Relations ● All Projects listed on COBRA IGUANA ORCA FAMOS LCG/AA OSCAR IGUANACMS IGNOMINY 1.9.0
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 External Projects
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 SCRAM ● manage our code building and configuration (more reproducable than make and autoconf ) ● typical commands are ● scram list This gets the list of all the current code releases known to scram ● scram project ORCA ORCA_8_2_0 This gets scram to create a local area for you that contains all the directories and configuration files you will need in order to work with this release of ORCA ● eval `scram runtime -csh` set required environment variables ● scram b This gets scram to compile and/or link the code ● Information ● Shaun Ashby ● ●
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Varia ● CVS ● that is where the code is stored ● maintains many versions of one file ● full history, parallel development ● OVAL ● tool to automatize validation ● Ecole Polytechnique ● ● Savannah ● Feature (bug) reporting system ●
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Pure C++ ● We work entirely in C++ ● You can use any standard tools of C++ ● COBRA pins DB objects in memory for duration of event ● You access C++ objects ● The fact that they may have be transient or in the DB is invisible to you ● This is true for most “developer” code and all “user” code ● Direct ROOT usage only for very private code ● Avoid it if you think your code will ever become official ● You can change local copies of the data, but you can’t unconsciously change persistent data even as viewed by another module in this reconstruction job.
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Generic WWW ● Main CMS page ● Public relations ● OO software ● Finding MC events (very little for ORCA-7) ● PRS groups ● meetings every 2 nd Tuesday/Wednesday
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 CMS Software ● There is much that will be very new ● But you should be able to quickly do the things you were used to do ● And the future possibilities are exciting and can offer you and CMS a significant advantage to get at the Physics first. ● C++/OO code can be hard to understand ● Not everything works as intended! ● Be a little patient also, sometimes the thing that seems to you the highest requirement may have not yet reached the top of the stack. ● All meetings RPROM, SPROM, PRS are open to all and video-conferenced at times that, while not always convenient, are not impossible for most time-zones ● Everyone has a steep learning curve to follow ● Use the documentation tools ● Ask people, they want to help
Vincenzo Innocente: CMS Software Introduction for Summer Students, 13 th July 2004 Summary ● LHC is extremly demanding on software ● Object Oriented techniques will help in answering that challenge ● CMS is well advanced deploying Tools for Simulation, Reconstruction and Analysis ● Physicists can work successfully with the Software ● Summer Students can contribute! New Woorld for the Brave Brave New Woorld