Alice DC Status P. Cerello March 19th, 2004.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
A tool to enable CMS Distributed Analysis
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
The ALICE short-term use case DataGrid WP6 Meeting Milano, 11 Dec 2000Piergiorgio Cerello 1 Physics Performance Report (PPR) production starting in Feb2001.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
M.MaseraALICE CALCOLO 2004 PHYSICS DATA CHALLENGE III IL CALCOLO NEL 2004: PHYSICS DATA CHALLENGE III Massimo Masera Commissione Scientifica.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 WP8 - Demonstration ALICE – Evolving.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
ARDA P.Cerello – INFN Torino ARDA Workshop June
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
INFNGRID Technical Board, Feb
Grid Computing: Running your Jobs around the World
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
BaBar-Grid Status and Prospects
Eleonora Luppi INFN and University of Ferrara - Italy
U.S. ATLAS Grid Production Experience
Moving the LHCb Monte Carlo production system to the GRID
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
INFN-GRID Workshop Bari, October, 26, 2004
The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN B00le.
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
F.Carminati Geneva, February 13, 2003
ALICE Physics Data Challenge 3
Grid2Win: Porting of gLite middleware to Windows XP platform
ALICE – Evolving towards the use of EDG/LCG - the Data Challenge 2004
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
AliRoot status and PDC’04
MC data production, reconstruction and analysis - lessons from PDC’04
Short update on the latest gLite status
April HEPCG Workshop 2006 GSI
Simulation use cases for T2 in ALICE
LCG middleware and LHC experiments ARDA project
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Use of Geant4 in experiment interactive frameworks AliRoot
ATLAS DC2 & Continuous production
Status and plans for bookkeeping system and production tools
gLite The EGEE Middleware Distribution
Offline framework for conditions data
The LHCb Computing Data Challenge DC06
Presentation transcript:

Alice DC Status P. Cerello March 19th, 2004

Summary Status of AliRoot Status of AliEn Physics Data Challenge Conclusions 19 Mar 2004

AliEn AliRoot layout ROOT G3 G4 FLUKA AliRoot Virtual MC EVGEN STEER ISAJET AliRoot Virtual MC AliReconstruction HIJING AliSimulation EVGEN MEVSIM HBTAN STEER PYTHIA6 PDF PMD EMCAL TRD ITS PHOS TOF ZDC RICH HBTP ESD AliAnalysis STRUCT CRT START FMD MUON TPC RALICE ROOT 19 Mar 2004

AliRoot Current status Major changes in the last year New multi-file I/O finally in full production New coordinate system (and we survived!) New reconstruction and simulations “drivers” First attempt at the ESD and analysis framework Improvements in reconstruction and simulation Clearly the system works well, however many changes to come ESD: the philosophy is still evolving Introduction of FLUKA and new geometrical modeller Development of the analysis framework Raw data for all the detectors Introduction of the condition database infrastructure 19 Mar 2004

Software Development Process ALICE opted for a light core CERN offline team… Concentrate on framework, software distribution and maintenance …plus some people from the collaboration GRID coordination (Torino), World Computing Model (Nantes), Detector Construction Database (Warsaw), Web and VMC (La Habana) Close integration with physics! The ALICE Physics Coordinator is also a member of the offline team A development cycle adapted to ALICE Developers work on the most important feature at any moment A stable production version exists Collective ownership of the code Flexible release cycle and simple packaging and installation Micro-cycles happen continuously, macro-cycles 2-3 times per year Discussed & implemented at Off-line meetings and Code Reviews 19 Mar 2004

The ALICE Approach (AliEn) Standards are now emerging for the basic building blocks of a GRID There are millions lines of code in the OS domain dealing with these issues Why not using these to build the minimal GRID that does the job? Fast development of a prototype, no problem in exploring new roads, restarting from scratch etc etc Hundreds of users and developers Immediate adoption of emerging standards An example, AliEn by ALICE (5% of code developed, 95% imported) (…) DBI DBD RDBMS (MySQL) LDAP Commands Packages V.O. & Perl Core Perl Modules Libraries External File & Metadata Catalogue SOAP/XML CE SE Logger Database Proxy Authentication RB User Interface ADBI Config Mgr Package Web Portal User Application API (C/C++/perl) CLI GUI AliEn Core Components & services Interfaces External software Low level High level FS 19 Mar 2004

Performance, Scalability, Standards AliEn Timeline 2001 2002 2003 2004 2005 Start First production (distributed simulation) Physics Performance Report (mixing & reconstruction) 10% Data Challenge (analysis) Functionality + Simulation Interoperability + Reconstruction Performance, Scalability, Standards + Analysis 19 Mar 2004

AliEn + ROOT (A) ? Results: 19 Mar 2004 provides: Analysis Macro Input Files Query for Input Data new TAliEnAnalysis Object USER produces List of Input Data + Locations Job Splitting Job Submission IO Object 1 for Site A IO Object 2 for Site A IO Object 1 for Site BI IO Object 1 for Site C Job Object 1 for Site A Job Object 2 for Site A Job Object 1 for Site B Job Object 1 for Site C Execution Histogram Merging Tree Chaining 19 Mar 2004 Results:

PROOF of AliEn (B) PROOF uses AliEn Grid File Catalogue and Data Management to map LFN’s to a chain of PFN’s and Workload Management to detect which nodes in a cluster can be used in a parallel session Nice! Now I can finally analyze my datasets on the Grid and produce a histogram. And it is fast too! The PROOF system allows: parallel analysis of objects in a set of files parallel execution of scripts on clusters of heterogeneous machines 19 Mar 2004

ALICE Physic Data Challenges Period (milestone) Fraction of the final capacity (%) Physics Objective 06/01-12/01 1% pp studies, reconstruction of TPC and ITS 06/02-12/02 5% First test of the complete chain from simulation to reconstruction for the PPR Simple analysis tools Digits in ROOT format 01/04-06/04 10% Complete chain used for trigger studies Prototype of the analysis tools Comparison with parameterised MonteCarlo Simulated raw data 01/06-06/06 20% Test of the final system for reconstruction and analysis 19 Mar 2004

PDC 3 schema Production of RAW Shipment of RAW to CERN AliEn job control Data transfer Production of RAW Shipment of RAW to CERN Reconstruction of RAW in all T1’s CERN Analysis Tier2 Tier1 Tier1 Tier2 19 Mar 2004

Merging Signal-free event Mixed signal 19 Mar 2004

AliEn, Genius & EDG/LCG seen by ALICE User submits jobs Server Alien CE LCG UI Alien CEs/SEs LCG RB LCG PFN LCG CEs/SEs Catalog Catalog LCG LFN LCG PFN = AliEn LFN 19 Mar 2004

AliEn – EDG Interface Mar, 11th, 2003: first AliRoot job, driven by AliEn, run on EDG Server Job submission Interface Site AliEn CE EDG UI AliEn SE EDG RB EDG Site EDG CE WN AliEn EDG SE PFN Status report Replica Catalogue Data Registration LFN Data Catalogue LFN=PFN 19 Mar 2004

ALICE PDC-3 & LCG All the production will be started via AliEn, the analysis will be done via Root/Proof/AliEn LCG-2 will be one CE element of AliEn, which will integrate seamlessly LCG and non LCG resources If LCG-2 works well, it will suck a large amount of jobs, and it will be used heavily If LCG-2 does not work well, AliEn will privilege other resources, and it will be less used In all cases we will use LCG-2 as much as possible We will not need to take any decision: the performance of the system will decide for us The figure of merit will be 19 Mar 2004

AliEn & LCG: Data Challenge CE/SE A User submits jobs Alien CE/SE Submission Alien CE/SE Server LCG CE/SE Alien CE LCG UI Catalog LCG CE/SE LCG RB Catalog LCG CE/SE 19 Mar 2004

AliEn – LCG Interface Remote AliEn and AliRoot installation OK on all LCG-2 sites Job management interface works with no real problem No reliable SE available on the LCG production infrastructure generated data is always moved to CERN CASTOR as soon as the job finishes, using AliEn tools (AIOd). An interface to LCG storage is anyhow available, and it will be tested as soon as LCG provides storage support on the EIS testbed. 19 Mar 2004

Software Installation on LCG Via LCG jobs $VO_ALICE_SW_DIR/root/v3-10-02/… geant3/v0-6/… aliroot/v4-01-Rev-00/… alien/… AliEn/… LCG site installAlice.sh installAlice.jdl LCG site LCG-UI LCG site installAliEn.sh LCG site installAliEn.jdl LCG site 19 Mar 2004

First Event Round on LCG Submitted OK Aborted by LCG “Zombi” Aborted by AliEn Still runinng Friday batch 480 157 5 201 117 Sunday batch 250 149 1 100 OK: as reported by AliEn. Output transfered to CERN CASTOR and registered on AliEn Data Catalogue Aborted by LCG: reported as “Aborted” by LB. Zombi: lost contact between AliEn and the job. All due to server and gateway restarts, many probably finished correctly on LCG. Aborted by AliEn: failed. Many due to server and gateway problems since then fixed. Still running: As reported by AliEn on Sunday, Feb, 29th, 5 p.m. 19 Mar 2004

Short history Jan 03: Requirements for ALICE PDC04 presented to PEB End Dec 03: Announcement of LCG-2 by mid February 2004 Beg Jan 04: Decision to delay PDC04 by one month waiting for LCG-2 End Jan 04: LCG announces that there will be no SE in LCG-2 Beg Feb 04: The WAN resources allocated by LCG for data storage are insufficient/inadequate Mid Feb 04: Development of an ALICE solution, developed in haste and working against all odds! End Feb 04: IT has also come up with a solution responding to a CMS requirement End Feb 04: Production started, new sites being added Confusing that during all this time LCG-2 has been declared “ready for ALICE” on a day-by-day basis! Beg Mar 04: castor database has to be reinstalled (running on Linux 6.2!) Beg Mar 04: castor servers have to be reinstalled for security Beg Mar 04: LCG RB works differently on the different centres. CNAF has to be switched on and off by hand, otherwise it “swallows” all the jobs! Beg Mar 04: we are getting now close to 10 TB, 30 were promised by LCG on 1/1/04 Mid Mar 04: Files on the IT-provided pool are erased before being copied on tape(!) 18 Mar 04: restart production & insert Grid.it 19 Mar 2004

Shapshot on Mar, 16th file:///C:/Documents%20and%20Settings/Piergiorgio%20Cerello/My%20Documents/Alice/AlienControls.htm 19 Mar 2004

Data Challenge Statistics First round, closed on Mar 16th 19 Mar 2004

Data Challenge Statistics First round, closed on Mar 16th 19 Mar 2004

Data Challenge Statistics First round, closed on Mar 16th 19 Mar 2004

DC Monitoring: http://alien.cern.ch Monalisa: http://aliens3.cern.ch:8080 19 Mar 2004

Shapshot on Mar, 18th file:///C:/Documents%20and%20Settings/Piergiorgio%20Cerello/My%20Documents/Alice/AlienControls2.htm 19 Mar 2004

Data Challenge Statistics First+Second round, started on Mar 18th : + 1713 jobs 19 Mar 2004

Data Challenge Statistics First+Second round, started on Mar 18th : +1051, + 680 19 Mar 2004

Data Challenge Statistics First+Second round, started on Mar 18th : +592, +476 19 Mar 2004

Present Status AliEn native sites LCG-2 sites Grid.it sites CERN, CNAF, Cyfronet, Catania, FZK, JINR, LBL, Lyon, OSC, Prague, Torino LCG-2 sites CERN, CNAF, RAL ok (up to 400 concurrent jobs) FZK: problems with installation, solved as of mar, 18th NIKHEF: old version of aliroot in $PATH – solved as of mar,18th TAIWAN: intermittent problems (network?) Fermilab: “not an Alice site” Grid.it sites Installation (aliroot & AliEn) ok everywhere but Bo In production as of mar, 18th Ba, Ct, Fe, LNL, Pd, To ok Bo-INGV, Pi, not seen by RB Bo, Rm: minor installation problems Mar, 19th, 00:30 – Ba 1, Ct 7, Fe 7, LNL 97, Pd 70, To 17 = 199 running jobs 19 Mar 2004

Double access @ CNAF WN WN WN WN WN Alien/CNAF CE/SE Alien CE LCG UI A User submits jobs Alien/CNAF CE/SE WN Submission Server WN Alien CE LCG UI LCG/CNAF CE/SE WN LCG RB WN 19 Mar 2004

Remarks First GRID production with fully transparent common access to different middlewares (AliEn & LCG) Relevant improvement in the LCG stability (450/12 hours wrt. 450/2 months) AliEn – LCG load is about 50-50 Optimal situation: wrt any other choice (AliEn only or LCG only) the availability of resources is doubled There is room for improvement (on both sides) but The Data Challenge started well, altough it is just at the beginning We hope in the continued support from LCG And centres should provide us with the promised resources AliEn already provides functionality for distributed analysis LCG/ARDA will improve it 19 Mar 2004

Conclusions ALICE has solutions that are evolving into a solid computing infrastructure Major decisions have been taken and users have adopted them Collaboration between physicists and computer scientists is excellent The tight integration with ROOT allows a fast prototyping and development cycle AliEn goes a long way toward providing a GRID solution adapted to HEP needs It allowed us to do large productions with very few people “in charge” Many ALICE-developed solutions have a high potential to be adopted by other experiments and indeed are becoming “common solutions” 19 Mar 2004

19 Mar 2004

AliEn 19 Mar 2004 DB Proxy User Interface Factory Auditing DBD/RDBMS Registry/Lookup/Config V.O. directory Authentication Storage Element Gatekeeper Job Manager Transfer Manager File Transfer 1 1 1. lookup Grid Monitoring CE 1..n 3. register 2. authenticate API 1..n 1 4. bind 1 0..n Process Monitor Transfer Broker Job Broker Job Optimizer Transfer Optimizer Catalogue Optimiser User Interface 0..n 1 0..n 1 1..n 0..n 1 1 0..n 1 1 1 1 0..n 19 Mar 2004

ARDA in a nutshell ARDA RTAG “Long they laboured in the regions of Eä, which are vast beyond the thought of Elves and Men, until in the time appointed was made Arda...” - J.R.R Tolkien, Valaquenta ARDA in a nutshell ARDA RTAG Found AliEn “the most complete system among all considered” in Sep ‘03 Suggested a “fast prototype” in 6 months Six months went to calm the turmoil spurred by this report! ARDA is now started as suggested by the report At least so we hope! ARDA, if successful, will form the basis for the EGEE MW 19 Mar 2004

AliEn++ (ARDA) 19 Mar 2004

ROOT, ALICE & LCG LCG has brought support for ROOT and FLUKA We will continue to develop our system Providing basic technology,e.g. VMC and geometrical modeller … and we will try to collaborate with LCG wherever possible Possible convergence in the simulation area, collaboration on simple benchmarks We have proposed to base LCG on ROOT and AliEn LCG established a client-provider relationship with ROOT, which is rapidly evolving Is now adopting AliEn via ARDA/EGEE LCG decided to develop alternatives for some ROOT elements or hide them with interfaces We expressed our worries No time to develop and deploy a new system Duplication and dispersion of efforts Divergence with the rest of HEP We will keep looking for opportunities to collaborate 19 Mar 2004