F.Carminati BNL Seminar March 21, 2005 ALICE Computing Model.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Trip Report SC’04 Pittsburgh Nov 6-12 Fons Rademakers.
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
The ALICE Computing F.Carminati May 4, 2006 Madrid, Spain.
The ALICE Distributed Computing Federico Carminati ALICE workshop, Sibiu, Romania, 20/08/2008.
ARDA Prototypes Andrew Maier CERN. ARDA WorkshopAndrew Maier, CERN2 Overview ARDA in a nutshell –Experiments –Middleware Experiment prototypes (basic.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
F.Carminati ALICE Computing Model Workshop December 9-10, 2004 Introduction and Overview of the ALICE Computing Model.
Planning and status of the Full Dress Rehearsal Latchezar Betev ALICE Offline week, Oct.12, 2007.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 WP8 - Demonstration ALICE – Evolving.
NEC' /09P.Hristov1 Alice off-line computing Alice Collaboration, presented by P.Hristov, CERN NEC'2001 September 12-18, Varna.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Summary of User Requirements for Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Offline Week Alignment and Calibration Workshop February.
Summary of Workshop on Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Computing Day February 28 th 2005.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
L. Perini DATAGRID WP8 Use-cases 19 Dec ATLAS short term grid use-cases The “production” activities foreseen till mid-2001 and the tools to be used.
P. Cerello (INFN – Torino) T0/1 Network Meeting Amsterdam January 20/21, 2005 The ALICE Computing and Data Model.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
ALICE Computing TDR Federico Carminati June 29, 2005.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
F.Carminati LHCC Review of the Experiment Computing Needs January 18, 2005 Overview of the ALICE Computing Model.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
Federating Data in the ALICE Experiment
ALICE Computing Data Challenge VI
Database Replication and Monitoring
Status of the CERN Analysis Facility
ALICE analysis preservation
INFN-GRID Workshop Bari, October, 26, 2004
Calibrating ALICE.
ALICE Physics Data Challenge 3
ALICE – Evolving towards the use of EDG/LCG - the Data Challenge 2004
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
AliRoot status and PDC’04
MC data production, reconstruction and analysis - lessons from PDC’04
Simulation use cases for T2 in ALICE
LCG middleware and LHC experiments ARDA project
ATLAS DC2 & Continuous production
Offline framework for conditions data
Presentation transcript:

F.Carminati BNL Seminar March 21, 2005 ALICE Computing Model

March 21, 2005ALICE Computing Model2 Offline framework AliRoot in development since 1998  Entirely based on ROOT  Used since the detector TDR’s for all ALICE studies Two packages to install (ROOT and AliRoot)  Plus MC’s Ported on most common architectures  Linux IA32, IA64 and AMD, Mac OS X, Digital True64, SunOS… Distributed development  Over 50 developers and a single CVS repository  2/3 of the code developed outside CERN Tight integration with DAQ (data recorder) and HLT (same code- base) Wide use of abstract interfaces for modularity “Restricted” subset of c++ used for maximum portability

March 21, 2005ALICE Computing Model3 AliRoot layout ROOT AliRoot STEER Virtual MC G3 G4 FLUKA HIJING MEVSIM PYTHIA6 PDF EVGEN HBTP HBTAN ISAJET AliEn/gLite EMCALZDCITSPHOSTRDTOFRICH ESD AliAnalysis AliReconstruction PMD CRTFMDMUONTPCSTARTRALICE STRUCT AliSimulation

March 21, 2005ALICE Computing Model4 Software management Regular release schedule  Major release every six months, minor release (tag) every month Emphasis on delivering production code  Corrections, protections, code cleaning, geometry Nightly produced UML diagrams, code listing, coding rule violations, build and tests, single repository with all the codeUML diagramscode listing, coding rule violationsbuild and tests,repository  No version management software (we have only two packages!) Advanced code tools under development (collaboration with IRST/Trento)  Smell detection (already under testing)  Aspect oriented programming tools  Automated genetic testing

March 21, 2005ALICE Computing Model5 ALICE Detector Construction Database (DCDB) Specifically designed to aid detector construction in distributed environment:  Sub-detector groups around the world work independently  All data collected in central repository and used to move components from one sub- detector group to another and during integration and operation phase at CERN Multitude of user interfaces:  WEB-based for humans  LabView, XML for laboratory equipment and other sources  ROOT for visualisation In production since 2002 A very ambitious project with important spin-offs  Cable Database  Calibration Database

March 21, 2005ALICE Computing Model6 The Virtual MC User Code VMC Geometrical Modeller G3 G3 transport G4 transport G4 FLUKA transport FLUKA Reconstruction Visualisation Generators

March 21, 2005ALICE Computing Model7 TGeo modeller

March 21, 2005ALICE Computing Model8 Results GeV/c protons in 60T field Geant3FLUKA HMPID 5 GeV Pions

March 21, 2005ALICE Computing Model9 ITS – SPD: Cluster Size PRELIMINARY!

March 21, 2005ALICE Computing Model10 Reconstruction strategy Main challenge - Reconstruction in the high flux environment (occupancy in the TPC up to 40%) requires a new approach to tracking Basic principle – Maximum information approach  Use everything you can, you will get the best Algorithms and data structures optimized for fast access and usage of all relevant information  Localize relevant information  Keep this information until it is needed

March 21, 2005ALICE Computing Model11 Tracking strategy – Primary tracks Incremental process  Forward propagation towards to the vertex TPC  ITS  Back propagation ITS  TPC  TRD  TOF  Refit inward TOF  TRD  TPC  ITS Continuous seeding  Track segment finding in all detectors Combinatorial tracking in ITS  Weighted two-tracks  2 calculated  Effective probability of cluster sharing  Probability not to cross given layer for secondary particles

March 21, 2005ALICE Computing Model12 Tracking & PID PIV 3GHz – (dN/dy – 6000)  TPC tracking - ~ 40s  TPC kink finder ~ 10 s  ITS tracking ~ 40 s  TRD tracking ~ 200 s TPCITS+TPC+TOF+TRD Contamination Efficiency ITS & TPC & TOF

March 21, 2005ALICE Computing Model13 Condition and alignment Heterogeneous information sources are p eriodically polled ROOT files with condition information are created These files are published on the Grid and distributed as needed by the Grid DMS Files contain validity information and are identified via DMS metadata No need for a distributed DBMS Reuse of the existing Grid services

March 21, 2005ALICE Computing Model14 External relations and DB connectivity DAQ Trigger DCS ECS Physics data DCDB AliEn  gLite: metadata file store calibration procedures calibration files AliRoot Calibration classes API files From URs: Source, volume, granularity, update frequency, access pattern, runtime environment and dependencies API – Application Program Interface Relations between DBs not final not all shown API HLT Call for UR sent to subdetectors

March 21, 2005ALICE Computing Model15 Metadata MetaData are essential for the selection of events We hope to be able to use the Grid file catalogue for one part of the MetaData  During the Data Challenge we used the AliEn file catalogue for storing part of the MetaData  However these are file-level MetaData We will need an additional event-level MetaData  This can be simply the TAG catalogue with externalisable references  We are discussing with STAR on this subject We will take a decision soon  We would prefer that the Grid scenario be clearer

March 21, 2005ALICE Computing Model16 ALICE CDC’s DateMBytes/s Tbytes to MSS Offline milestone 10/ Rootification of raw data -Raw data for TPC and ITS 9/ Integration of single detector HLT code Partial data replication to remote centres 5/2004  3/200 5(?) 450 HLT prototype for more detectors - Remote reconstruction of partial data streams -Raw digits for barrel and MUON 5/ Prototype of the final remote data replication (Raw digits for all detectors) 5/ (1250 if possible) Final test (Final system)

March 21, 2005ALICE Computing Model17 Use of HLT for monitoring in CDC’s Aliroot Simulation Digits Raw Data LDC GDC Event builder alimdc Root file CASTOR AliEn Monitoring HLT Algorithms ESD Histograms

March 21, 2005ALICE Computing Model18 Period (milestone) Fraction of the final capacity (%) Physics Objective 06/01-12/011% pp studies, reconstruction of TPC and ITS 06/02-12/025% First test of the complete chain from simulation to reconstruction for the PPR Simple analysis tools Digits in ROOT format 01/04-06/0410% Complete chain used for trigger studies Prototype of the analysis tools Comparison with parameterised MonteCarlo Simulated raw data 05/05-07/05TBD Test of condition infrastructure and FLUKA To be combined with SDC 3 Speed test of distributing data from CERN 01/06-06/0620% Test of the final system for reconstruction and analysis ALICE Physics Data Challenges

March 21, 2005ALICE Computing Model19 CERN Tier2Tier1Tier2Tier1 Production of RAW Shipment of RAW to CERN Reconstruction of RAW in all T1’s Analysis AliEn job control Data transfer PDC04 schema

March 21, 2005ALICE Computing Model20 Signal-free event Mixed signal Phase 2 principle

March 21, 2005ALICE Computing Model21 Simplified view of the ALICE Grid with AliEn Local scheduler ALICE VO – central services Central Task Queue Job submission File Catalogue Configuration Accounting User authentication Computing Element Workload management Job Monitoring Storage volume manager Data Transfer Storage Element Cluster Monitor AliEn Site services Disk and MSS Existing site components ALICE VO – Site services integration

March 21, 2005ALICE Computing Model22 Site services Inobtrusive – entirely in user space:  Singe user account  All authentication already assured by central services  Tuned to the existing site configuration – supports various schedulers and storage solutions  Running on many Linux flavours and platforms (IA32, IA64, Opteron)  Automatic software installation and updates (both service and application) Scalable and modular – different services can be run on different nodes (in front/behind firewalls) to preserve site security and integrity: Load balanced file transfer nodes (on HTAR) CERN firewall solution for large volume file transfers Fire wall ONLY High ports (50K- 55K) for parallel file transport CERN Intranet AliEn Data Transfer AliEn Other services

March 21, 2005ALICE Computing Model23 HP ProLiant DL380 AliEn Proxy Server Up to 2500 concurrent client connections HP ProLiant DL580 AliEn File Catalogue 9Mio entries, 400K directories, 10GB MySQL DB HP server rx2600 AliEn Job Services 500 K archived jobs HP ProLiant DL360 AliEn to CASTOR (MSS) interface HP ProLiant DL380 AliEn Storage Elements Volume Manager 4 Mio entries, 3GB MySQL DB Log files, application software storage 1TB SATA Disk server

March 21, 2005ALICE Computing Model24 Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue, processes monitoring and control, SE… Central servers CEs Sub-jobs Job processing AliEn-LCG interface Sub-jobs RB Job processing CEs Storage CERN CASTOR: underlying events Local SEs CERN CASTOR: backup copy Storage Primary copy Local SEs Output files Underlying event input files zip archive of output files Register in AliEn FC: LCG SE: LCG LFN = AliEn PFN edg(lcg) copy&register File catalogue Phase 2 job structure Task - simulate the event reconstruction and remote event storage Completed Sep. 2004

March 21, 2005ALICE Computing Model25 Production history ALICE repository – history of the entire DC ~ monitored parameters:  Running, completed processes  Job status and error conditions  Network traffic  Site status, central services monitoring  …. 7 GB data 24 million records with 1 minute granularity – analysed to improve GRID performance Statistics  jobs, 6 hours/job, 750 MSi2K hours  9M entries in the AliEn file catalogue  4M physical files at 20 AliEn SEs in centres world-wide  30 TB stored at CERN CASTOR  10 TB stored at remote AliEn SEs + 10 TB backup at CERN  200 TB network transfer CERN –> remote computing centres  AliEn efficiency observed >90%  LCG observed efficiency 60% (see GAG document)

March 21, 2005ALICE Computing Model26 Job repartition Jobs (AliEn/LCG): Phase /25%, Phase 2 – 89/11% More operation sites added to the ALICE GRID as PDC progressed Phase 2 Phase 1 17 permanent sites (33 total) under AliEn direct control and additional resources through GRID federation (LCG)

March 21, 2005ALICE Computing Model27 Summary of PDC’04  Computing resources  It took some effort to ‘tune’ the resources at the remote computing centres  The centres’ response was very positive – more CPU and storage capacity was made available during the PDC  Middleware  AliEn proved to be fully capable of executing high-complexity jobs and controlling large amounts of resources  Functionality for Phase 3 has been demonstrated, but cannot be used  LCG MW proved adequate for Phase 1, but not for Phase 2 and in a competitive environment  It cannot provide the additional functionality needed for Phase 3  ALICE computing model validation:  AliRoot – all parts of the code successfully tested  Computing elements configuration  Need for a high-functionality MSS shown  Phase 2 distributed data storage schema proved robust and fast  Data Analysis could not be tested

March 21, 2005ALICE Computing Model28 Development of Analysis Analysis Object Data designed for efficiency  Contain only data needed for a particular analysis Analysis à la PAW  ROOT + at most a small library Work on the distributed infrastructure has been done by the ARDA project Batch analysis infrastructure  Prototype published at the end of 2004 with AliEn Interactive analysis infrastructure  Demonstration performed at the end 2004 with AliEn  gLite Physics working groups are just starting now, so timing is right to receive requirements and feedback

March 21, 2005ALICE Computing Model29 Forward Proxy Rootd Proofd Grid/Root Authentication Grid Access Control Service TGrid UI/Queue UI Proofd StartupPROOFClient PROOFMaster Slave Registration/ Booking- DB Site PROOF SLAVE SERVERS Site A PROOF SLAVE SERVERS Site B LCGPROOFSteer Master Setup New Elements Grid Service Interfaces Grid File/Metadata Catalogue Client retrieves list of logical file (LFN + MSN) Booking Request with logical file names “Standard” Proof Session Slave ports mirrored on Master host Optional Site Gateway Master Client Grid-Middleware independend PROOF Setup Only outgoing connectivity

March 21, 2005ALICE Computing Model30 Grid situation History  Jan ‘04: AliEn developers are hired by EGEE and start working on new MW  May ‘04: A prototype derived from AliEn is offered to pilot users (ARDA, Biomed..) under the gLite name  Dec ‘04: The four experiments ask for this prototype to be deployed on larger preproduction service and be part of the EGEE release  Jan ‘05: This is vetoed at management level -- AliEn will not be common software Current situation  EGEE has vaguely promised to provide the same functionality of AliEn-derived MW But with a 2-4 months delay at least on top of the one already accumulated But even this will be just the beginning of the story: the different components will have to be field tested in a real environment, it took four years for AliEn  All experiments have their own middleware  Our is not maintained because our developers have been hired by EGEE  EGEE has formally vetoed any further work on AliEn or AliEn-derived software  LCG has allowed some support for ALICE but the situation is far from being clear gLite gLate

March 21, 2005ALICE Computing Model31 ALICE computing model For pp similar to the other experiments  Quasi-online data distribution and first reconstruction at T0  Further reconstruction passes at T1’s For AA different model  Calibration, alignment and pilot reconstructions during data taking  Data distribution and first reconstruction at T0 during the four months after AA run (shutdown)  Second and third pass distributed at T1’s For safety one copy of RAW at T0 and a second one distributed among all T1’s T0: First pass reconstruction, storage of one copy of RAW, calibration data and first-pass ESD’s T1: Subsequent reconstructions and scheduled analysis, storage of the second collective copy of RAW and one copy of all data to be safely kept (including simulation), disk replicas of ESD’s and AOD’s T2: S imulation and end-user analysis, disk replicas of ESD’s and AOD’s Very difficult to estimate network load

March 21, 2005ALICE Computing Model32 ALICE requirements on MiddleWare One of the main uncertainties of the ALICE computing model comes from the Grid component  ALICE was developing its computing model assuming that a MW with the same quality and functionality that AliEn would have had in two years from now will be deployable on the LCG computing infrastructure  If not, we will still analyse the data (!), but Less efficiency  more computers  more time and money More people for production  more money To elaborate an alternative model we should know what will be The functionality of the MW developed by EGEE The support we can count on from LCG Our “political” “margin of manoeuvre”

March 21, 2005ALICE Computing Model33 Possible strategy If a)Basic services from LCG/EGEE MW can be trusted at some level b)We can get some support to port the “higher functionality” MW onto these services We have a solution If a) above is not true but if a)We have support for deploying the ARDA-tested AliEn- derived gLite b)We do not have a political “veto” We still have a solution Otherwise we are in trouble

March 21, 2005ALICE Computing Model34 ALICE Offline Timeline PDC04 Analysis PDC04 Design of new components Development of new components PDC05 PDC06 preparation PDC06 Final development of AliRoot First data taking preparation PDC05 Computing TDR PDC06 AliRoot ready nous sommes ici CDC 04? PDC04 CDC 05

March 21, 2005ALICE Computing Model35 Main parameters

March 21, 2005ALICE Computing Model36 Processing pattern

March 21, 2005ALICE Computing Model37 Conclusions ALICE has made a number of technical choices for the Computing framework since 1998 that have been validated by experience  The Offline development is on schedule, although contingency is scarce Collaboration between physicists and computer scientists is excellent Tight integration with ROOT allows fast prototyping and development cycle AliEn goes a long way in providing a GRID solution adapted to HEP needs  However its evolution into a common project has been “stopped”  This is probably the largest single “risk factor” for ALICE computing Some ALICE-developed solutions have a high potential to be adopted by other experiments and indeed are becoming “common solutions”