Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alice DC Status P. Cerello March 19th, 2004.

Similar presentations

Presentation on theme: "Alice DC Status P. Cerello March 19th, 2004."— Presentation transcript:

1 Alice DC Status P. Cerello March 19th, 2004

2 Summary Status of AliRoot Status of AliEn Physics Data Challenge
Conclusions 19 Mar 2004

3 AliEn AliRoot layout ROOT G3 G4 FLUKA AliRoot Virtual MC EVGEN STEER

4 AliRoot Current status
Major changes in the last year New multi-file I/O finally in full production New coordinate system (and we survived!) New reconstruction and simulations “drivers” First attempt at the ESD and analysis framework Improvements in reconstruction and simulation Clearly the system works well, however many changes to come ESD: the philosophy is still evolving Introduction of FLUKA and new geometrical modeller Development of the analysis framework Raw data for all the detectors Introduction of the condition database infrastructure 19 Mar 2004

5 Software Development Process
ALICE opted for a light core CERN offline team… Concentrate on framework, software distribution and maintenance …plus some people from the collaboration GRID coordination (Torino), World Computing Model (Nantes), Detector Construction Database (Warsaw), Web and VMC (La Habana) Close integration with physics! The ALICE Physics Coordinator is also a member of the offline team A development cycle adapted to ALICE Developers work on the most important feature at any moment A stable production version exists Collective ownership of the code Flexible release cycle and simple packaging and installation Micro-cycles happen continuously, macro-cycles 2-3 times per year Discussed & implemented at Off-line meetings and Code Reviews 19 Mar 2004

6 The ALICE Approach (AliEn)
Standards are now emerging for the basic building blocks of a GRID There are millions lines of code in the OS domain dealing with these issues Why not using these to build the minimal GRID that does the job? Fast development of a prototype, no problem in exploring new roads, restarting from scratch etc etc Hundreds of users and developers Immediate adoption of emerging standards An example, AliEn by ALICE (5% of code developed, 95% imported) (…) DBI DBD RDBMS (MySQL) LDAP Commands Packages V.O. & Perl Core Perl Modules Libraries External File & Metadata Catalogue SOAP/XML CE SE Logger Database Proxy Authentication RB User Interface ADBI Config Mgr Package Web Portal User Application API (C/C++/perl) CLI GUI AliEn Core Components & services Interfaces External software Low level High level FS 19 Mar 2004

7 Performance, Scalability, Standards
AliEn Timeline Start First production (distributed simulation) Physics Performance Report (mixing & reconstruction) 10% Data Challenge (analysis) Functionality + Simulation Interoperability + Reconstruction Performance, Scalability, Standards + Analysis 19 Mar 2004

8 AliEn + ROOT (A) ? Results: 19 Mar 2004 provides: Analysis Macro
Input Files Query for Input Data new TAliEnAnalysis Object USER produces List of Input Data + Locations Job Splitting Job Submission IO Object 1 for Site A IO Object 2 for Site A IO Object 1 for Site BI IO Object 1 for Site C Job Object 1 for Site A Job Object 2 for Site A Job Object 1 for Site B Job Object 1 for Site C Execution Histogram Merging Tree Chaining 19 Mar 2004 Results:

9 PROOF of AliEn (B) PROOF uses AliEn Grid File Catalogue and Data Management to map LFN’s to a chain of PFN’s and Workload Management to detect which nodes in a cluster can be used in a parallel session Nice! Now I can finally analyze my datasets on the Grid and produce a histogram. And it is fast too! The PROOF system allows: parallel analysis of objects in a set of files parallel execution of scripts on clusters of heterogeneous machines 19 Mar 2004

10 ALICE Physic Data Challenges
Period (milestone) Fraction of the final capacity (%) Physics Objective 06/01-12/01 1% pp studies, reconstruction of TPC and ITS 06/02-12/02 5% First test of the complete chain from simulation to reconstruction for the PPR Simple analysis tools Digits in ROOT format 01/04-06/04 10% Complete chain used for trigger studies Prototype of the analysis tools Comparison with parameterised MonteCarlo Simulated raw data 01/06-06/06 20% Test of the final system for reconstruction and analysis 19 Mar 2004

11 PDC 3 schema Production of RAW Shipment of RAW to CERN
AliEn job control Data transfer Production of RAW Shipment of RAW to CERN Reconstruction of RAW in all T1’s CERN Analysis Tier2 Tier1 Tier1 Tier2 19 Mar 2004

12 Merging Signal-free event Mixed signal 19 Mar 2004

13 AliEn, Genius & EDG/LCG seen by ALICE
User submits jobs Server Alien CE LCG UI Alien CEs/SEs LCG RB LCG PFN LCG CEs/SEs Catalog Catalog LCG LFN LCG PFN = AliEn LFN 19 Mar 2004

14 AliEn – EDG Interface Mar, 11th, 2003: first AliRoot job, driven by AliEn, run on EDG Server Job submission Interface Site AliEn CE EDG UI AliEn SE EDG RB EDG Site EDG CE WN AliEn EDG SE PFN Status report Replica Catalogue Data Registration LFN Data Catalogue LFN=PFN 19 Mar 2004

15 ALICE PDC-3 & LCG All the production will be started via AliEn, the analysis will be done via Root/Proof/AliEn LCG-2 will be one CE element of AliEn, which will integrate seamlessly LCG and non LCG resources If LCG-2 works well, it will suck a large amount of jobs, and it will be used heavily If LCG-2 does not work well, AliEn will privilege other resources, and it will be less used In all cases we will use LCG-2 as much as possible We will not need to take any decision: the performance of the system will decide for us The figure of merit will be 19 Mar 2004

16 AliEn & LCG: Data Challenge
CE/SE A User submits jobs Alien CE/SE Submission Alien CE/SE Server LCG CE/SE Alien CE LCG UI Catalog LCG CE/SE LCG RB Catalog LCG CE/SE 19 Mar 2004

17 AliEn – LCG Interface Remote AliEn and AliRoot installation OK on all LCG-2 sites Job management interface works with no real problem No reliable SE available on the LCG production infrastructure generated data is always moved to CERN CASTOR as soon as the job finishes, using AliEn tools (AIOd). An interface to LCG storage is anyhow available, and it will be tested as soon as LCG provides storage support on the EIS testbed. 19 Mar 2004

18 Software Installation on LCG
Via LCG jobs $VO_ALICE_SW_DIR/root/v /… geant3/v0-6/… aliroot/v4-01-Rev-00/… alien/… AliEn/… LCG site installAlice.jdl LCG site LCG-UI LCG site LCG site installAliEn.jdl LCG site 19 Mar 2004

19 First Event Round on LCG
Submitted OK Aborted by LCG “Zombi” Aborted by AliEn Still runinng Friday batch 480 157 5 201 117 Sunday batch 250 149 1 100 OK: as reported by AliEn. Output transfered to CERN CASTOR and registered on AliEn Data Catalogue Aborted by LCG: reported as “Aborted” by LB. Zombi: lost contact between AliEn and the job. All due to server and gateway restarts, many probably finished correctly on LCG. Aborted by AliEn: failed. Many due to server and gateway problems since then fixed. Still running: As reported by AliEn on Sunday, Feb, 29th, 5 p.m. 19 Mar 2004

20 Short history Jan 03: Requirements for ALICE PDC04 presented to PEB
End Dec 03: Announcement of LCG-2 by mid February 2004 Beg Jan 04: Decision to delay PDC04 by one month waiting for LCG-2 End Jan 04: LCG announces that there will be no SE in LCG-2 Beg Feb 04: The WAN resources allocated by LCG for data storage are insufficient/inadequate Mid Feb 04: Development of an ALICE solution, developed in haste and working against all odds! End Feb 04: IT has also come up with a solution responding to a CMS requirement End Feb 04: Production started, new sites being added Confusing that during all this time LCG-2 has been declared “ready for ALICE” on a day-by-day basis! Beg Mar 04: castor database has to be reinstalled (running on Linux 6.2!) Beg Mar 04: castor servers have to be reinstalled for security Beg Mar 04: LCG RB works differently on the different centres. CNAF has to be switched on and off by hand, otherwise it “swallows” all the jobs! Beg Mar 04: we are getting now close to 10 TB, 30 were promised by LCG on 1/1/04 Mid Mar 04: Files on the IT-provided pool are erased before being copied on tape(!) 18 Mar 04: restart production & insert 19 Mar 2004

21 Shapshot on Mar, 16th file:///C:/Documents%20and%20Settings/Piergiorgio%20Cerello/My%20Documents/Alice/AlienControls.htm 19 Mar 2004

22 Data Challenge Statistics
First round, closed on Mar 16th 19 Mar 2004

23 Data Challenge Statistics
First round, closed on Mar 16th 19 Mar 2004

24 Data Challenge Statistics
First round, closed on Mar 16th 19 Mar 2004

25 DC Monitoring:
Monalisa: 19 Mar 2004

26 Shapshot on Mar, 18th file:///C:/Documents%20and%20Settings/Piergiorgio%20Cerello/My%20Documents/Alice/AlienControls2.htm 19 Mar 2004

27 Data Challenge Statistics
First+Second round, started on Mar 18th : jobs 19 Mar 2004

28 Data Challenge Statistics
First+Second round, started on Mar 18th : +1051, + 680 19 Mar 2004

29 Data Challenge Statistics
First+Second round, started on Mar 18th : +592, +476 19 Mar 2004

30 Present Status AliEn native sites LCG-2 sites sites
CERN, CNAF, Cyfronet, Catania, FZK, JINR, LBL, Lyon, OSC, Prague, Torino LCG-2 sites CERN, CNAF, RAL ok (up to 400 concurrent jobs) FZK: problems with installation, solved as of mar, 18th NIKHEF: old version of aliroot in $PATH – solved as of mar,18th TAIWAN: intermittent problems (network?) Fermilab: “not an Alice site” sites Installation (aliroot & AliEn) ok everywhere but Bo In production as of mar, 18th Ba, Ct, Fe, LNL, Pd, To ok Bo-INGV, Pi, not seen by RB Bo, Rm: minor installation problems Mar, 19th, 00:30 – Ba 1, Ct 7, Fe 7, LNL 97, Pd 70, To 17 = 199 running jobs 19 Mar 2004

31 Double access @ CNAF WN WN WN WN WN Alien/CNAF CE/SE Alien CE LCG UI
A User submits jobs Alien/CNAF CE/SE WN Submission Server WN Alien CE LCG UI LCG/CNAF CE/SE WN LCG RB WN 19 Mar 2004

32 Remarks First GRID production with fully transparent common access to different middlewares (AliEn & LCG) Relevant improvement in the LCG stability (450/12 hours wrt. 450/2 months) AliEn – LCG load is about 50-50 Optimal situation: wrt any other choice (AliEn only or LCG only) the availability of resources is doubled There is room for improvement (on both sides) but The Data Challenge started well, altough it is just at the beginning We hope in the continued support from LCG And centres should provide us with the promised resources AliEn already provides functionality for distributed analysis LCG/ARDA will improve it 19 Mar 2004

33 Conclusions ALICE has solutions that are evolving into a solid computing infrastructure Major decisions have been taken and users have adopted them Collaboration between physicists and computer scientists is excellent The tight integration with ROOT allows a fast prototyping and development cycle AliEn goes a long way toward providing a GRID solution adapted to HEP needs It allowed us to do large productions with very few people “in charge” Many ALICE-developed solutions have a high potential to be adopted by other experiments and indeed are becoming “common solutions” 19 Mar 2004

34 19 Mar 2004

35 AliEn 19 Mar 2004 DB Proxy User Interface Factory Auditing DBD/RDBMS
Registry/Lookup/Config V.O. directory Authentication Storage Element Gatekeeper Job Manager Transfer Manager File Transfer 1 1 1. lookup Grid Monitoring CE 1..n 3. register 2. authenticate API 1..n 1 4. bind 1 0..n Process Monitor Transfer Broker Job Broker Job Optimizer Transfer Optimizer Catalogue Optimiser User Interface 0..n 1 0..n 1 1..n 0..n 1 1 0..n 1 1 1 1 0..n 19 Mar 2004

36 ARDA in a nutshell ARDA RTAG
“Long they laboured in the regions of Eä, which are vast beyond the thought of Elves and Men, until in the time appointed was made Arda...” - J.R.R Tolkien, Valaquenta ARDA in a nutshell ARDA RTAG Found AliEn “the most complete system among all considered” in Sep ‘03 Suggested a “fast prototype” in 6 months Six months went to calm the turmoil spurred by this report! ARDA is now started as suggested by the report At least so we hope! ARDA, if successful, will form the basis for the EGEE MW 19 Mar 2004

37 AliEn++ (ARDA) 19 Mar 2004

38 ROOT, ALICE & LCG LCG has brought support for ROOT and FLUKA
We will continue to develop our system Providing basic technology,e.g. VMC and geometrical modeller … and we will try to collaborate with LCG wherever possible Possible convergence in the simulation area, collaboration on simple benchmarks We have proposed to base LCG on ROOT and AliEn LCG established a client-provider relationship with ROOT, which is rapidly evolving Is now adopting AliEn via ARDA/EGEE LCG decided to develop alternatives for some ROOT elements or hide them with interfaces We expressed our worries No time to develop and deploy a new system Duplication and dispersion of efforts Divergence with the rest of HEP We will keep looking for opportunities to collaborate 19 Mar 2004

Download ppt "Alice DC Status P. Cerello March 19th, 2004."

Similar presentations

Ads by Google