Presentation is loading. Please wait.

Presentation is loading. Please wait.

F.Carminati LHCC Review of the Experiment Computing Needs January 18, 2005 Overview of the ALICE Computing Model.

Similar presentations


Presentation on theme: "F.Carminati LHCC Review of the Experiment Computing Needs January 18, 2005 Overview of the ALICE Computing Model."— Presentation transcript:

1 F.Carminati LHCC Review of the Experiment Computing Needs January 18, 2005 Overview of the ALICE Computing Model

2 January 18, 2004LHCC Review of Experiment Needs2 Scope of the presentation Describe the current status of the ALICE Computing Model Describe the assumptions leading to the stated needs Express our requirements toward LCG and FA’s Give an overview of the future evolution of the ALICE Computing Project

3 January 18, 2004LHCC Review of Experiment Needs3 Timeline December 9-10: the draft computing model and the projected needs are discussed at an ALICE workshop December 14: presentation to the ALICE Management Board December 17: documents delivered to LHCC January 18: this presentation

4 January 18, 2004LHCC Review of Experiment Needs4 Related documents Computing MOU  Distributed to the Collaboration for feedback on October 1, 2004  Provide the C-RRB with documents to be approved at its April 2005 meeting  Subsequently distributed for signature LCG TDR  First draft April 11, good copy May 9  June 15 final TDR to LHCC (LHCC mtg. June 29-30)  June 3 document ready for approval by PEB on June 7 ALICE Computing TDR  Elements of the early draft given to LHCC on December 17, 2004  Draft presented during the ALICE/offline week in February 2005  Approval during the ALICE/offline week in June 2005

5 January 18, 2004LHCC Review of Experiment Needs5 Computing TDR layout Parameters  Data format, model and handling  Analysis requirements and model Computing framework  Framework for simulation, reconstruction analysis  Choices, perspectives Distributed computing and Grid  T0, T1’s, T2’s, networks  Distributed computing model, MW requirements Project Organisation and planning  Computing organisation, plans, milestones  Size and costs: CPU, disk, tape, network, services, people Resources needed  CPU, disk, tape, network, services  Overview of pledged resources

6 January 18, 2004LHCC Review of Experiment Needs6 Mandate of the LHCC review In the context of the preparation of the Computing MoUs and TDRs, the LHC experiments have come forward with estimated computing capacity requirements in terms of disks, tapes, CPUs and networks for the Tier-0, Tier-1 and Tier-2 centres. The numbers vary in many cases (mostly upwards) from those submitted to the LHC Computing Review in 2001 […] it is felt to be desirable at this stage to seek an informed, independent view on the reasonableness of the present estimates. […] the task of this Review is thus to examine critically, in close discussion with the computing managements of the experiments, the current estimates and report on their validity in the light of the presently understood characteristics of the LHC experimental programme. The exercise will therefore not be a review of the underlying computing architecture.

7 January 18, 2004LHCC Review of Experiment Needs7 Outline of the presentations Outline of the computing model (F.Carminati)  From detector to RAW data  Framework & software management  Simulation  Reconstruction  Condition infrastructure  Analysis  Grid Middleware & distributed computing environment  Project management & planning  Data handling model and issues Experience with Data Challenge (L.Betev) Computing needs from RAW to analysis (Y.Schutz)

8 January 18, 2004LHCC Review of Experiment Needs8 GDC TRG, DAQ, HLT architecture CTP LTU TTC FERO LTU TTC FERO LDC BUSY Rare/All Event Fragment Sub-event Event File Storage Network TDS PDS L0, L1a, L2 262 DDLs EDM LDC Load Bal. LDC HLT Farm FEP DDL H-RORC 10 DDLs 10 D-RORC 10 HLT LDC 123 DDLs TDS DSS Event Building Network 329 D-RORC 175 Detector LDC 50 GDC 25 TDS 5 DSS DATE (DAQ software framework)

9 January 18, 2004LHCC Review of Experiment Needs9 General structure Base Header Sub-event Base Header Header extension Event Fragment Equipment payload (DDL Header and Data) Equipment Header Event Fragment Event Sub-event Event fragment DDL/RORC LDC GDC Event fragments include  LHC clock at interaction time = orbit number + bunch crossing  Trigger parameters  Both LHC clock and trigger distributed by Central Trigger Processor to all detectors readout

10 January 18, 2004LHCC Review of Experiment Needs10 HLT Data Readout Data sent by HLT over DDL are event-built and recorded Decisions  Events accepted: event-built according to decision  Events rejected: keep a trace of decision ESD produced by HLT  Events accepted ESD and any other data event-built with raw data  Events rejected: If needed all ESDs could be kept as a separate ESD stream Typical application of HLT decision

11 January 18, 2004LHCC Review of Experiment Needs11 Event building and data recording in GDCs GDC NIC DATE data banks event builder ROOT recorder Raw data HLT data Sub-events (raw data, HLT) HLT decisions Storage Network Event Building Network Complete accepted events Event builder:  In: sub-events  Out: I/O vector  Set of pointer/size pairs ROOT recorder:  ROOT data format  Possibly parallel streams  CASTOR file system  Interfaced to the Grid Grid Catalog AliEn  gLite

12 January 18, 2004LHCC Review of Experiment Needs12 Offline framework AliRoot in development since 1998  Entirely based on ROOT  Used for the detector TDR’s and the PPR Two packages to install (ROOT and AliRoot)  Plus transport MC’s Ported on several architectures (Linux IA32, IA64 and AMD, Mac OS X, Digital True64, SunOS…) Distributed development  Over 50 developers and a single cvs repository Tight integration with DAQ (data recorder) and HLT (same code-base)

13 January 18, 2004LHCC Review of Experiment Needs13 AliRoot layout ROOT AliRoot STEER Virtual MC G3 G4 FLUKA HIJING MEVSIM PYTHIA6 PDF EVGEN HBTP HBTAN ISAJET AliEn/gLite EMCALZDCITSPHOSTRDTOFRICH ESD AliAnalysis AliReconstruction PMD CRTFMDMUONTPCSTARTRALICE STRUCT AliSimulation

14 January 18, 2004LHCC Review of Experiment Needs14 Software management Regular release schedule  Major release every six months, minor release (tag) every month Emphasis on delivering production code  Corrections, protections, code cleaning, geometry Nightly produced UML diagrams, code listing, coding rule violations, build and tests, single repository with all the codeUML diagramscode listing, coding rule violationsbuild and tests,repository  No version management software (we have only two packages!) Advanced code tools under development (collaboration with IRST)  Aspect oriented programming  Smell detection  Automated testing

15 January 18, 2004LHCC Review of Experiment Needs15 Simulation Simulation performed with Geant3 till now Virtual MonteCarlo interface insulates the ALICE code from the MonteCarlo used New geometrical modeller is entering production Interface with FLUKA finishing validation The Physics Data Challenge 2005 will be performed with FLUKA Interface with Geant4 designed and ready to be implemented  4Q’05 (depending on manpower availability) Test-beam validation activity ongoing

16 January 18, 2004LHCC Review of Experiment Needs16 The Virtual MC User Code VMC Geometrical Modeller G3 G3 transport G4 transport G4 FLUKA transport FLUKA Reconstruction Visualisation Generators

17 January 18, 2004LHCC Review of Experiment Needs17 HMPID: 5 GeV Pions Geant3FLUKA

18 January 18, 2004LHCC Review of Experiment Needs18 TGeo modeller

19 January 18, 2004LHCC Review of Experiment Needs19 Reconstruction strategy Main challenge - Reconstruction in the high flux environment (occupancy in the TPC up to 40%) requires a new approach to tracking Basic principle – Maximum information approach  Use everything you can, you will get the best Algorithms and data structures optimized for fast access and usage of all relevant information  Localize relevant information  Keep this information until it is needed

20 January 18, 2004LHCC Review of Experiment Needs20 Tracking strategy – Primary tracks Incremental process  Forward propagation towards to the vertex TPC  ITS  Back propagation ITS  TPC  TRD  TOF  Refit inward TOF  TRD  TPC  ITS Continuous seeding  Track segment finding in all detectors Combinatorial tracking in ITS  Weighted two-tracks  2 calculated  Effective probability of cluster sharing  Probability not to cross given layer for secondary particles

21 January 18, 2004LHCC Review of Experiment Needs21 Results – Tracking efficiency PIV 3GHz – (dN/dy – 6000)  TPC tracking - ~ 40s  TPC kink finder ~ 10 s  ITS tracking ~ 40 s  TRD tracking ~ 200 s TPCITS+TPC+TOF+TRD

22 January 18, 2004LHCC Review of Experiment Needs22 Combined PID over ITS-TPC-TOF (Kaons) ITSTPC TOF Combined PID efficiency is not lower and the contamination is not higher than the ones of the detectors stand-alone. Selection : ITS & TPC & TOF (central PbPb HIJING events) Contamination Efficiency ITS & TPC & TOF

23 January 18, 2004LHCC Review of Experiment Needs23 Collaboration with HLT Joint development of algorithms HLT algorithms can be “plug-and-played” in the Offline reconstruction HLT ESD’s are produced routinely during physics data challenges HLT and Offline code sit in the same cvs repository HLT algorithms are used during computing data challenges to perform data monitoring  Not in the final architecture, but a good test of the algorithms

24 January 18, 2004LHCC Review of Experiment Needs24 Use of HLT for monitoring in CDC’s Aliroot Simulation Digits Raw Data LDC GDC Event builder alimdc Root file CASTOR AliEn Monitoring HLT Algorithms ESD Histograms

25 January 18, 2004LHCC Review of Experiment Needs25 Condition DataBases Information comes from heterogeneous sources All sources are periodically polled and ROOT files with condition information are created These files are published on the Grid and distributed as needed by the Grid DMS Files contain validity information and are identified via DMS metadata No need for a distributed DBMS Reuse of the existing Grid services

26 January 18, 2004LHCC Review of Experiment Needs26 External relations and DB connectivity DAQ Trigger DCS ECS Physics data DCDB AliEn  gLite: metadata file store calibration procedures calibration files AliRoot Calibration classes API files From URs: Source, volume, granularity, update frequency, access pattern, runtime environment and dependencies API – Application Program Interface Relations between DBs not final not all shown API HLT Call for UR sent to subdetectors

27 January 18, 2004LHCC Review of Experiment Needs27 Development of Analysis Analysis Object Data designed for efficiency  Contain only data needed for a particular analysis Analysis à la PAW  ROOT + at most a small library Work on the distributed infrastructure has been done by the ARDA project Batch analysis infrastructure  Prototype published at the end of 2004 with AliEn Interactive analysis infrastructure  Demonstration performed at the end 2004 with AliEn  gLite Waiting now for the deployment of gLite MW to analyse the data of PDC04 Physics working groups are just starting now, so timing is right to receive requirements and feedback

28 January 18, 2004LHCC Review of Experiment Needs28 Forward Proxy Rootd Proofd Grid/Root Authentication Grid Access Control Service TGrid UI/Queue UI Proofd StartupPROOFClient PROOFMaster Slave Registration/ Booking- DB Site PROOF SLAVE SERVERS Site A PROOF SLAVE SERVERS Site B LCGPROOFSteer Master Setup New Elements Grid Service Interfaces Grid File/Metadata Catalogue Client retrieves list of logical file (LFN + MSN) Booking Request with logical file names “Standard” Proof Session Slave ports mirrored on Master host Optional Site Gateway Master Client Grid-Middleware independend PROOF Setup Only outgoing connectivity

29 January 18, 2004LHCC Review of Experiment Needs29 Physics Data Challenges We need  Simulated events to exercise physics reconstruction and analysis  To exercise the code and the computing infrastructure to define the parameters of the computing model  A serious evaluation of the Grid infrastructure  To exercise the collaboration readiness to take and analyse data Physics Data Challenges are one of the major inputs for our Computing Model and our requirements on the Grid See L.Betev’s talk!

30 January 18, 2004LHCC Review of Experiment Needs30 Period (milestone) Fraction of the final capacity (%) Physics Objective 06/01-12/011% pp studies, reconstruction of TPC and ITS 06/02-12/025% First test of the complete chain from simulation to reconstruction for the PPR Simple analysis tools Digits in ROOT format 01/04-06/0410% Complete chain used for trigger studies Prototype of the analysis tools Comparison with parameterised MonteCarlo Simulated raw data 05/05-07/05TBD Test of condition infrastructure and FLUKA Test of gLite and CASTOR Speed test of distributing data from CERN 01/06-06/0620% Test of the final system for reconstruction and analysis ALICE Physics Data Challenges NEW

31 January 18, 2004LHCC Review of Experiment Needs31 The ALICE Grid strategy Functionality + Simulation Interoperability + Reconstruction Performance, Scalability, Standards + Analysis First production (distributed simulation) Physics Performance Report (mixing & reconstruction) 10% Data Challenge (analysis) 2001 2002 2003 2004 2005 Start There are millions lines of code in OS dealing with GRID issues Why not using them to build the minimal GRID that does the job? Fast development of a prototype, can restart from scratch etc etc Hundreds of users and developers Immediate adoption of emerging standards AliEn by ALICE (5% of code developed, 95% imported) gLite

32 January 18, 2004LHCC Review of Experiment Needs32 ALICE requirements on MiddleWare ALICE assumes that a MW with the same quality and functionality that AliEn would have had in two years from now will be deployable on the LCG computing infrastructure All users should work in a pervasive Grid environment This would be best achieved via a common project, and ALICE has still hopes that the EGEE MW will provide this If this cannot be done via a common project, then it could still be achieved continuing the development of the AliEn-derived components of gLite  But then few key developers should support ALICE Should this turn out to be impossible (but why?), the Computing Model would have to be changed  More human [O(20) FTE/y] and hardware resources [O(+25%)] will be needed for the analysis of the ALICE data

33 January 18, 2004LHCC Review of Experiment Needs33 Production Environment Coord. Production environment (simulation, reconstruction & analysis) Distributed computing environment Database organisation Detector Projects Framework & Infrastructure Coord. Framework development (simulation, reconstruction & analysis) Persistency technology Computing data challenges Industrial joint projects Tech. Tracking Documentation Simulation Coord. Detector Simulation Physics simulation Physics validation GEANT 4 integration FLUKA integration Radiation Studies Geometrical modeler International Computing Board DAQ Reconstruction & Physics Soft Coord. Tracking Detector reconstruction Global reconstruction Analysis tools Analysis algorithms Physics data challenges Calibration & alignment algorithms Management Board Regional Tiers Offline Board Chair: Comp Coord Software Projects HLT LCG SC2, PEB, GDB, POB Core Computing and Software EU Grid coord. US Grid coord. Offline Coordination Resource planning Relation with funding agencies Relations with C-RRB Offline Coord. (Deputy PL)

34 January 18, 2004LHCC Review of Experiment Needs34 Core Computing Staffing

35 January 18, 2004LHCC Review of Experiment Needs35 Computing Project Core Computing Subdetector Software Physics Analysis Software Core Software Infrastructure & Services Offline Coordination Central Support M&O A Funding Comp projDetector projPhysics WGs

36 January 18, 2004LHCC Review of Experiment Needs36 Offline activities in the other projects Extended Core Offline CERN Core Offline 20  ~15 100  ~500? 10  ~7

37 January 18, 2004LHCC Review of Experiment Needs37 Cosmic Ray Telescope (CRT) A.Fernández Offline Board Chair F.Carminati Electromagnetic Calorimeter (EMCAL) G.Odyniec, H.Gray Forward Multiplicity Detector (FMD) A.Maevskaya Inner Tracking System (ITS) R.Barbera, M.Masera Muon Spectrometer (MUON) A.DeFalco, G.Martinez Photon Spectrometer (PHOS) Y.Schutz Photon Multiplicity Detector (PMD) B.Nandi High Momentum Particle ID (HMPID) D.DiBari T0 Detector (START) A.Maevskaya Time of Flight (TOF) A.DeCaro, G.Valenti Time Projection Chamber (TPC) M.Kowalski, M.Ivanov Transition Radiation Detector (TRD) C.Blume, A.Sandoval V0 detector (VZERO) B.Cheynis Zero Degree Calorimeter (ZDC) E.Scomparin Detector Construction DB W.Peryt ROOT R.Brun, F.Rademakers Core Offline P.Buncic, A.Morsch, F.Rademakers, K.Safarik Web & VMC CEADEN Eu Grid coordination P.Cerello US Grid coordination L.Pinsky

38 January 18, 2004LHCC Review of Experiment Needs38 Workplan in 2005 Development of Alignment & Calibration framework Introduction of new geometry modeller and transport MC Continued collaboration with DAQ and HLT Continued AliRoot evolution Development of analysis environment Development of MetaData Development of visualisation Revision of detector geometry and simulation Migration to new Grid software Physics and computing challenge 2005 Organisation of computing resources Writing of the computing TDR

39 January 18, 2004LHCC Review of Experiment Needs39 ALICE Offline Timeline 200420052006 PDC04 Analysis PDC04 Design of new components Development of new components PDC05 PDC06 preparation PDC06 Final development of AliRoot First data taking preparation PDC05 Computing TDR PDC06 AliRoot ready nous sommes ici CDC 04? PDC04 CDC 05

40 January 18, 2004LHCC Review of Experiment Needs40 Assumptions We assume the latest schedule for LHC (peak L)  07 100d pp 5x10 6 s@5x10 32  08 200d pp 10 7 s@2x10 33 20d HI 10 6 s@5x10 25  09 200d pp 10 7 s@2x10 33 20d HI 10 6 s@5x10 26  10 200d pp 10 7 s@10 34 20d HI 10 6 s@5x10 26  Luminosity changes during the fill, data rate remains constant (apart perhaps for p-p due to pileup) We have changed our computing resources ramp-up  Was 20%@2007, 30%@beg2008, 100%@end2008  Is 20%@2007, 40%@2008, 100%@2009  In general cost is reduced by -40%/y We assume on average three reconstruction passes  A fair integral estimation For safety we store one copy of RAW at T0 and a second one distributed among all T1’s

41 January 18, 2004LHCC Review of Experiment Needs41 Assumptions for the distributed computing model We assume that there will be an operational Grid  This is a bold assumption in the sense explained before  Technically possible, could be hindered politically If not, we will still analyse the data (!), but  Less efficiency  more computers  more time and money  More people for production  more money We assume a cloud model rather than a strict hierarchy à la MONARC  T0: first reconstruction pass, storage of one copy of RAW, calibration and first-pass ESD’s  T1: subsequent reconstructions, simulation and analysis, storage of one collective copy of RAW, one copy of all data to be safely kept (including simulation), disk replicas of ESD’s and AOD’s  T2: simulation and analysis, disk replicas of ESD’s and AOD’s

42 January 18, 2004LHCC Review of Experiment Needs42 Data format RAW  Lightweight ROOT format tested in data challenges  No streaming (this might still change)  One copy spread over T1 centers and one at Tier-0 Reconstruction produces ESD (safely stored at T0 & T1’s)  Reconstructed objects (tracks, vertices, etc.)  Early/Detailed Analysis  One or more copies at T1 centers ESD are filtered into AOD, several streams for different analysis  Analysis specific reconstructed objects  All streams at every T1 center, many streams at T2 centers TAG are short summaries for every event with the event reference  Externalisable pointers  Summary information and event-level metadata MC events are larger due to embedded debugging information We hope not to have to decide where data goes!  Data should be replicated and moved by the Grid to maximise the use of resources  The average replication factor on disk is 2

43 January 18, 2004LHCC Review of Experiment Needs43 Metadata MetaData are essential for the selection of events We hope to be able to use the Grid file catalogue for one part of the MetaData  During the Data Challenge we used the AliEn file catalogue for storing part of the MetaData  However these are file-level MetaData We will need an additional catalogue for event-level MetaData  This can be simply the TAG catalogue with externalisable references We will take a decision in 2005, hoping that the Grid scenario will be clearer

44 January 18, 2004LHCC Review of Experiment Needs44 Processing strategy For pp similar to the other experiments  Quasi-online reconstruction first pass at T0, further reconstruction passes at T1’s  Quasi-online data distribution For AA different model  Calibration, alignment and pilot reconstructions during data taking  First reconstruction during the four months after AA run (shutdown), second and third pass distributed at T1’s  Distribution of AA data during the four months after AA run  Problem of tape read-back being considered Again we assume the Grid that can optimise the workload

45 January 18, 2004LHCC Review of Experiment Needs45 T0/T1 complex Acquire and store RAW data and first pass ESD’s Perform pilot processing, calibration and alignment  Quasi-online during data taking Perform first pass reconstruction  AA during the four months after data taking  pp quasi-online The question of latency in the distribution of data and conditions has to be studied in detail and prototyped One-day AA data disk buffer for possible network breakdowns

46 January 18, 2004LHCC Review of Experiment Needs46 T1 Store a safe copy of data  A share of RAW from T0  MC data from T2’s and T1’s  ESD and AOD produced both locally and remotely  Condition data Store active data on disk Serve data to T2’s and other T1’s Perform second and third pass reconstruction for pp and AA  Over resident and possibly remote RAW Perform analysis and simulation opportunistically  Again we assume that the Grid WMS will “smooth” the workload Run official reconstruction passes on local RAW and ESD

47 January 18, 2004LHCC Review of Experiment Needs47 T2 Perform simulation and analysis  In a cloud model the locality of the users will not count much  Target places for chaotic analysis  J obs managed by central production via Grid Store copies of ESD’s, AOD’s and condition data on disk Store data at T1’s

48 January 18, 2004LHCC Review of Experiment Needs48 Networking Most difficult to predict in absence of a precise analysis model Net traffic T0  T1 can be calculated, however the headroom to add is not clear  Service data challenges will help here Traffic T1  T2 can also be calculated from the model, but it depends on Grid operation and analysis model Traffic T1  T1 & T2  T2 depends also on the Grid ability to use non local files and on the size of the disk cache available  A valid model for this does not exist

49 January 18, 2004LHCC Review of Experiment Needs49 Uncertainties in the model No clear estimates of calibration and alignment needs No experience with analysis data access patterns  Analysis of PDC04 hampered by lack of proper MW on LCG  We will probably see “real” patterns only after 2007! We never tried to “push out” the data from T0 at the required speed  This will be done in the LCG service challenges We are still uncertain on the event size  In particular the pile-up in pp  ESD and AOD are still evolving No firm decision on data streaming We need to keep options open!

50 January 18, 2004LHCC Review of Experiment Needs50 Uncertainties in the resources Defining a model needs some preliminary estimate of the resources (which understandably need a model to be pledged!) We have identified six potential T1’s  Lyon, CNAF, RAL, Nordic Countries, FZK, NIKHEF  Their respective contribution to ALICE will vary considerably from centre to centre Sharing with other experiments Size of ALICE community in the country Estimation of T2’s are even less certain at the moment

51 January 18, 2004LHCC Review of Experiment Needs51 Open issues Balance local-remote processing at T1’s  We assume the Grid will be clever enough to send a job to a free T1 even if the RAW is not resident there Balance tape-disk at T1’s  Will affect mostly analysis performance Storage of Simulation  Assumed to be at T1’s  Difficult to estimate the load on the network Ramp-up  Our figures are calculated for a standard year: we need to work-out with LCG a ramp-up scenario T2’s are supposed to fail-over to T1’s for simulation and analysis  But again we suppose the Grid does this!

52 January 18, 2004LHCC Review of Experiment Needs52 Conclusions ALICE has made a number of technical choices for the Computing framework since 1998 that have been validated by experience  The Offline development is on schedule, although contingency is scarce ALICE developed a Grid solution adequate for its needs that has now an uncertain future as a common project  This is a (non-technical) high-risk factor for ALICE computing While still many unknowns remain, ALICE has developed a computing model from which predictions of the needed resources can be derived with reasonable confidence  Lack of supported Grid MW providing the functionality required for analysis (i.e. the functionality we have demonstrated in ARDA with AliEn  gLite) makes the development of an analysis model the weakest point


Download ppt "F.Carminati LHCC Review of the Experiment Computing Needs January 18, 2005 Overview of the ALICE Computing Model."

Similar presentations


Ads by Google