The ALICE Computing F.Carminati May 4, 2006 Madrid, Spain.

1 The ALICE Computing F.Carminati May 4, 2006 Madrid, Spain

2 24/5/2006fca @ CIEMAT level 0 - special hardware 8 kHz (160 GB/sec) level 1 - embedded processors level 2 - PCs 200 Hz (4 GB/sec) 30 Hz (2.5 GB/sec) 30 Hz (1.25 GB/sec) data recording & offline analysis Total weight10,000t Overall diameter 16.00m Overall length25m Magnetic Field0.4Tesla ALICE Collaboration ~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1000 people, 30 countries, ~ 80 Institutes

3 34/5/2006fca @ CIEMAT The history Developed since 1998 along a coherent line Developed in close collaboration with the ROOT team No separate physics and computing team –Minimise communication problems –May lead to “double counting” of people Used for the TDR’s of all detectors and Computing TDR simulations and reconstructions


5 54/5/2006fca @ CIEMAT The code 0.5MLOC C++ 0.5MLOC “vintage” FORTRAN code Nightly builds Strict coding conventions Subset of C++ (no templates, STL or exceptions!) –“Simple” C++, fast compilation and link (see R.Brun’s talk) –No configuration management tools (only cvs) –aliroot is a single package to install Maintained on several systems –DEC-Tru64, Mac OSX, Linux RH/SLC/Fedora (i32:i64:AMD), Sun Solaris 30% developed at CERN and 70% outside

6 64/5/2006fca @ CIEMAT The tools Coding convention checker Reverse engineering Smell detection Branch instrumentation Genetic testing (in preparation) Aspect Oriented Programming (in preparation)

7 74/5/2006fca @ CIEMAT The Simulation User Code VMC Geometrical Modeller G3 G3 transport G4 transport G4 FLUKA transport FLUKA Reconstruction Visualisation Generators See A.Morsch’s talk

8 84/5/2006fca @ CIEMAT

9 94/5/2006fca @ CIEMAT TGeo modeller

10 104/5/2006fca @ CIEMAT The reconstruction Incremental process –Forward propagation towards to the vertex TPC  ITS –Back propagation ITS  TPC  TRD  TOF –Refit inward TOF  TRD  TPC  ITS Continuous seeding –Track segment finding in all detectors Best track 1 Best track 2 Conflict ! TRD TPC ITS TOF Combinatorial tracking in ITS –Weighted two-tracks  2 calculated –Effective probability of cluster sharing –Probability not to cross given layer for secondary particles See P.Hristov’s talk

11 114/5/2006fca @ CIEMAT Calibration DAQ Trigger DCS ECS Physics data DCDB AliEn+LCG metadata file store calibration procedures calibration files AliRoot Calibration classes API files From URs: Source, volume, granularity, update frequency, access pattern, runtime environment and dependencies API – Application Program Interface API HLT shuttle

12 124/5/2006fca @ CIEMAT Alignment Simulation Ideal Geometry Misalignment Reconstruction Raw data File from survey Ideal Geometry Alignment procedure

13 134/5/2006fca @ CIEMAT Tag architecture ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… Reconstruction Bitmap Index Index builder Analysis job Selection List of ev#guid’s proof#1 proof#2 proof#3 … proof#n guid#{ev1…evn}

14 144/5/2006fca @ CIEMAT Visualisation See M.Tadel’s talk

15 154/5/2006fca @ CIEMAT ALICE Analysis Basic Concepts Analysis Models –Prompt reco/analysis at T0 using PROOF infrastructure –Batch Analysis using GRID infrastructure –Interactive Analysis using PROOF(+GRID) infrastructure User Interface –ALICE User access any GRID Infrastructure via AliEn or ROOT/PROOF UIs AliEn –Native and “GRID on a GRID” (LCG/EGEE, ARC, OSG) –integrate as much as possible common components LFC, FTS, WMS, MonALISA... PROOF/ROOT –single + multitier static and dynamic PROOF cluster –GRID API class TGrid(virtual)  TAliEn(real)  p 

16 164/5/2006fca @ CIEMAT If you thought this was difficult... NA49 experiment: A Pb-Pb event

17 174/5/2006fca @ CIEMAT ALICE Pb-Pb central event N ch (-0.5<  <0.5)=8000 … then what about this!

18 184/5/2006fca @ CIEMAT ALICE Collaboration ~ 1000 Members (63% from CERN MS) ~30 Countries ~80 Institutes

19 194/5/2006fca @ CIEMAT CERN computing power “High throughput” computing based on reliable commercial components More tha 1500 double CPU PC’s –5000 in 2007 More than 3 PB of data on disks & tapes –> 15 PB in 2007 Far from enough!

20 204/5/2006fca @ CIEMAT EGEE production service >180 sites >15 000 CPUs ~14 000 jobs completed per day 20 VOs >800 registered users that represent thousand of scientists Situation 20 September 2005

21 214/5/2006fca @ CIEMAT ALICE view on the current situation EDG AliEn Exp specific services LCG AliEn arch + LCG code  EGEE Exp specific services (AliEn’ for ALICE) EGEE, ARC, OSG…

22 224/5/2006fca @ CIEMAT Job 1.1lfn1 Job 1.2lfn2 Job 1.3lfn3, lfn4 Job 2.1lfn1, lfn3 Job 2.1lfn2, lfn4 Job 3.1lfn1, lfn3 Job 3.2lfn2 Site ALICE central services ALICE Grid Optimizer Computing Agent RB CE WN Execs agent Submits job User ALICE Job Catalogue VO-Box LCG User Job ALICE catalogues Registers output lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} ALICE File Catalogue packman SA xrootd GUID LFC SRM MSS File access Workload request SURL

23 234/5/2006fca @ CIEMAT File Catalogue query CE and SE processing User job (many events) Data set (ESD’s, AOD’s) Job Optimizer Sub-job 1 Sub-job 2Sub-job n CE and SE processing CE and SE processing Job Broker Grouped by SE files location Submit to CE with closest SE Output file 1Output file 2Output file n File merging job Job output Distributed analysis processing

24 244/5/2006fca @ CIEMAT Data Challenge Last (!) exercise before data taking Test of the system started with simulation Up to 3600 jobs running in parallel Next will be reconstruction and analysis

25 254/5/2006fca @ CIEMAT ALICE computing model For pp similar to the other experiments –Quasi-online data distribution and first reconstruction at T0 –Further reconstructions at T1’s For AA different model –Calibration, alignment, pilot reconstructions and partial data export during data taking –Data distribution and first reconstruction at T0 in the four months after AA run (shutdown) –Further reconstructions at T1’s T0: First pass reconstruction, storage of RAW, calibration data and first-pass ESD’s T1: Subsequent reconstructions and scheduled analysis, storage of a collective copy of RAW and one copy of data to be safely kept, disk replicas of ESD’s and AOD’s T2: Simulation and end-user analysis, disk replicas of ESD’s and AOD’s

26 264/5/2006fca @ CIEMAT Production Environment Coord. Production environment (simulation, reconstruction & analysis) Distributed computing environment Database organisation Detector Projects Framework & Infrastructure Coord. Framework development (simulation, reconstruction & analysis) Persistency technology Computing data challenges Industrial joint projects Tech. Tracking Documentation Simulation Coord. Detector Simulation Physics simulation Physics validation GEANT 4 integration FLUKA integration Radiation Studies Geometrical modeler International Computing Board DAQ Reconstruction & Physics Soft Coord. Tracking Detector reconstruction Global reconstruction Analysis tools Analysis algorithms Physics data challenges Calibration & alignment algorithms Management Board Regional Tiers Offline Board Chair: Comp Coord Software Projects HLT LCG SC2, PEB, GDB, POB Core Computing and Software EU Grid coord. US Grid coord. Offline Coordination Resource planning Relation with funding agencies Relations with C-RRB Offline Coord. (Deputy PL)

27 274/5/2006fca @ CIEMAT Conclusions ALICE has followed a single evolution line since eight years Most of the initial choices have been validated by our experience Some parts of the framework still have to be populated by the sub-detectors Wish us good luck!

