ARDA P.Cerello – INFN Torino ARDA Workshop June 23 2004.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
The EDG Testbed Deployment Details The European DataGrid Project
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
The ALICE short-term use case DataGrid WP6 Meeting Milano, 11 Dec 2000Piergiorgio Cerello 1 Physics Performance Report (PPR) production starting in Feb2001.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
M.MaseraALICE CALCOLO 2004 PHYSICS DATA CHALLENGE III IL CALCOLO NEL 2004: PHYSICS DATA CHALLENGE III Massimo Masera Commissione Scientifica.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 WP8 - Demonstration ALICE – Evolving.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Phase 2 of the Physics Data Challenge ‘04 Latchezar Betev ALICE Offline week Geneva, September 15, 2004.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
ALICE RRB-T ALICE Computing – an update F.Carminati 23 October 2001.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
INFNGRID Technical Board, Feb
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
BaBar-Grid Status and Prospects
The EDG Testbed Deployment Details
Real Time Fake Analysis at PIC
L’analisi in LHCb Angelo Carbone INFN Bologna
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Status of the CERN Analysis Facility
INFN-GRID Workshop Bari, October, 26, 2004
ALICE Physics Data Challenge 3
ALICE – Evolving towards the use of EDG/LCG - the Data Challenge 2004
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Nicolas Jacq LPC, IN2P3/CNRS, France
MC data production, reconstruction and analysis - lessons from PDC’04
Alice Week Offline Day F.Carminati March 18, 2002 ALICE Week India.
Simulation use cases for T2 in ALICE
MonteCarlo production for the BaBar experiment on the Italian grid
ATLAS DC2 & Continuous production
The LHCb Computing Data Challenge DC06
Presentation transcript:

ARDA P.Cerello – INFN Torino ARDA Workshop June

June, 23 rd, 2004ARDA Workshop2 The ALICE Approach (AliEn) Functionality + Simulation Interoperability + Reconstruction Performance, Scalability, Standards + Analysis First production (distributed simulation) Physics Performance Report (mixing & reconstruction) 10% Data Challenge (analysis) Start There are millions lines of code in OS dealing with GRID issues Why not using them to build the minimal GRID that does the job? Fast development of a prototype, can restart from scratch etc etc Hundreds of users and developers Immediate adoption of emerging standards AliEn by ALICE (5% of code developed, 95% imported)

June, 23 rd, 2004ARDA Workshop3 AliEn activity for the PPR 32 sites configured 5 sites providing mass storage capability 12 production rounds jobs validated, 2428 failed (10%) Up to 450 concurrent jobs 0.5 operators Yerevan Saclay Lyon Dubna Capetown, ZA Birmingham Cagliari NIKHEF Catania Bologna Torino Padova IRB Kolkata, India OSU/OSC LBL/NERSC Merida Bari Nantes Houston RAL CERN Krakow GSI Budapest Karlsruhe

June, 23 rd, 2004ARDA Workshop4 CERN Tier2Tier1Tier2Tier1 Production of RAW Shipment of RAW to CERN Reconstruction of RAW in all T1’s Analysis AliEn job control Data transfer PDC 3 schema DO IT ALL ON THE GRID!!!!

June, 23 rd, 2004ARDA Workshop5 Simulation Small input with interaction conditions Signal- free event Large distributed output (1 GB/event) with simulated detector response Long execution time (10 hours/event)

June, 23 rd, 2004ARDA Workshop6 Signal- free event Mixed signal Merging + Reconstruction Large distributed input (1 GB/event) Fairly Large distributed output (100 MB/event, 7MB files) with reconstructed events

June, 23 rd, 2004ARDA Workshop7 Signal- free event Mixed signal Analysis Large distributed input (100 MB/event) Fairly small merged output Server

June, 23 rd, 2004ARDA Workshop8 Phases of ALICE Physics Data Challenge 2004 Phase 1 - production of underlying events using heavy ion MC generators Status – 100% complete in about 2 months of active running Basic statistics - ~ 1.3 million files, 26 TB data volume Phase 2 – mixing of signal events in the underlying events Status – about to begin Phase 3 – analysis of signal+underlying events: Goal – to test the data analysis model of ALICE Status – will begin in 2 months

June, 23 rd, 2004ARDA Workshop9 DC2004 layout: a “Meta-Grid” AliEn CE LCG UI AliEn CE/SE Server Submission Master job queue Catalogue AliEn CE/SE AliEn CE/SE LCG CE/SE LCG RB LCG CE/SE LCG CE/SE Catalogue

June, 23 rd, 2004ARDA Workshop10 Design strategy “Pull-model” well suited for implementing high-level submission systems, since it does not require knowledge about the periphery, that may be very complex Use AliEn as a single general front-end Owned and shared resources are exploited transparently Minimize points of contact and don’t be invasive No need to reimplement services etc. No special services required to run on remote CE/WNs Make full use of provided services: Data Catalogues, scheduling, monitoring… Let the Grids do their jobs (they should know how) Use high-level tools and APIs to access Grid resources Developers put a lot of abstraction effort into hiding the complexity and shielding the user from implementation changes

June, 23 rd, 2004ARDA Workshop11 Implementation Implementation: manage LCG (& Grid.it) resources through a “gateway”: an AliEn client (CE+SE) sitting on top of an LCG User Interface The whole of LCG computing is seen as a single, large AliEn CE associated with a single, large SE Job management interface JDL translation (incl. InputData statements) JobID bookkeeping Command proxying (Submit, Cancel, Status query…) Data management interface Put() and Get() implementation AliEn PFN is LCG GUID (plus SE host to allow for optimisation) AliEn knows nothing about LCG replicas (but RLS/ROS do!)

June, 23 rd, 2004ARDA Workshop12 Interfacing AliEn and LCG LCG Site EDG CE WN AliEn LCG SE LCG RB Server Interface Site AliEn CE LCG UI AliEn SE Job submission Status report Replica Catalogue Data Registration GUID Data Catalogue Data Registration PFN = GUID LFN PFN

June, 23 rd, 2004ARDA Workshop13 Alien CE LCG UI Alien CEs/SEs Server User submits jobs Catalog LCG RB LCG CEs/SEs LCG GUID LCG PFN LCG GUID = AliEn PFN Catalog AliEn, Genius & EDG/LCG LCG-2 is one CE of AliEn, which integrates LCG and non LCG resources If LCG-2 can run a large number of jobs, it will be used heavily If LCG-2 cannot do that, AliEn selects other resources, and it will be less used

June, 23 rd, 2004ARDA Workshop14 DC2004 layout: a “Meta-Grid” AliEn CE LCG UI AliEn CE/SE Server Submission Master job queue Catalogue AliEn CE/SE AliEn CE/SE LCG CE/SE LCG RB LCG CE/SE LCG CE/SE Catalogue

June, 23 rd, 2004ARDA Workshop15 Double Access (CNAF & ct.infn.it) Alien CE LCG UI AliEn CE/SE Server Submission A User submits jobs LCG RB LCG CE/SE WN

June, 23 rd, 2004ARDA Workshop16 Software installation on LCG Both AliEn and AliRoot installed via LCG jobs Do some checks, download tarballs, uncompress, build environment script and publish relevant tags Single command available to get the list of available sites, send the jobs everywhere and wait for completion. Full update on LCG-2 + GRID.IT (16 sites) takes ~30’ Manual intervention still needed in few sites (e.g. CERN/LSF) Ready for integration into AliEn automatic installation system Experiment software shared area misconfiguration caused most of the trouble in the beginning LCG-UI NIKHEF Taiwan RAL CNAF TO.INFN installAlice.sh installAlice.jdl installAliEn.sh installAliEn.jdl …

June, 23 rd, 2004ARDA Workshop17 Phase 1 results Number of jobs: Central 1 (long, 12 hours) – 20 K Peripheral 1 (medium – 6 hours) – 20 K Peripheral 2 to 5 (short – 1 to 3 hours) – 16 K Number of files: AliEn file catalogue: 3.8 million (no degradation in performance observed) CERN Castor: 1.3 million We did not use the LCG Data Management tools as they were not mature or not deployed when we started Phase 1 File size: Total: 26 TB Resources provided by 27 active production centres, 12 workhorses Total: 285 MSI-2K hours LCG: 67 MSI-2K hours (24%, can be 50% at top resource exploitation) Some exotic centres as proof-of-concept (e.g. in Pakistan), an Itanium farm in Houston

June, 23 rd, 2004ARDA Workshop18 DC monitoring is helpful AliEn native site LCG-2 GRID.IT

June, 23 rd, 2004ARDA Workshop19 DC Monitoring Total CPU profile over the first phase of the DC: Aiming for continuous running, not always possible due to resources constraints

June, 23 rd, 2004ARDA Workshop20 Running Job History

June, 23 rd, 2004ARDA Workshop21 Groups Efficiency Job execution, System and AliRoot efficiency per groups

June, 23 rd, 2004ARDA Workshop22 Phase 2 layout Alien CE LCG-UI Server User submits jobs Catalog LCG RB LCG LFN = AliEn PFN lcg://host/ Catalog Alien CE/SE LCG CE/SE CERN Castor edg-copy-and-register Phase 2 -- about to start Mixing of the underlying events with signal events (jets, muons, J/  ) & reconstruction We plan to fully use LCG DM tools, but we may have problems of storage availability at local SEs

June, 23 rd, 2004ARDA Workshop23 Issues with Phase 2 Phase II will generate lots (1M) of (rather small ~7MB) files It will start in a few days We are testing a plug-in to AliEn using tar to bunch small files The space available on the some LCG SEs seems very small… we might be able to run just for few hours before shutting them off Preparation of the LCG-2 JDL is more complicated, due to the use of the data management features

June, 23 rd, 2004ARDA Workshop24 Phase 3 layout Alien CE LCG-UI Server User query Catalog LCG RB Catalog Alien CE/SE LCG CE/SE lfn 1 lfn 2 lfn 3 lfn 7 lfn 8 lfn 4 lfn 5 lfn 6 LCG CE/SE Phase 3 -- foreseen in two months Analysis of signal+underlying events: Test the data analysis model of ALICE Need to think about output merging from LCG ARDA availability would simplify phase 3!!!!

June, 23 rd, 2004ARDA Workshop25 AliEn command-line tools OK for DC running and resources control AliEn quick feedback from the CE and WN proved to be essential for early spotting of problems Jobs are constantly monitored through heartbeat Failure thresholds are put in place to close problematic CEs automatically: limitation on LCG interface sites – global “switching-off” of LCG Status descriptors for every stage of the job execution are essential for problem spotting and debugging Centralized and compact AliEn master services allow for fast updates and implementation of new features Considerations I

June, 23 rd, 2004ARDA Workshop26 Considerations II LCG-2 provides many cycles But required continuous efforts and interventions (ALICE and LCG) Instabilities came in particular from the LCG site configurations The LCG-SE is still very “fluid”, so we may expect instabilities Analysis will be hopefully run with ARDA There is a big difference between pledged and available resources, and we learned this the hard way!

June, 23 rd, 2004ARDA Workshop27 Considerations III The concept of AliEn as meta-grid works well, across three grids, and this is a success in itself “keyhole” approach, some things become awkward (e.g. monitoring!) We are confident that we will reach the objectives

June, 23 rd, 2004ARDA Workshop28 AliEn + ROOT / PROOF -> ARDA Analysis Macro provides: Input Files ? Query for Input Data USER produces List of Input Data + Locations new TAliEnAnalysis Object IO Object 2 for Site A IO Object 1 for Site BI IO Object 1 for Site C IO Object 1 for Site A Job Object 1 for Site A Job Object 2 for Site A Job Object 1 for Site B Job Object 1 for Site C Job Submission Job Splitting Histogram Merging Tree Chaining Results: Execution

June, 23 rd, 2004ARDA Workshop29 Considerations IV Workload Management Limit to the amount of resources that can be served by a single “Master Queue”/“RB” – related to the job duration Several Instances or a two-level hierarchy? Data Management Minimize Data transfers (Application Data Model) Functionality must be available, but the system must prevent misuse! Association of a SE content with a DataCatalogue instance local to the SE, in addition to the Global “VO-Catalogue”?

June, 23 rd, 2004ARDA Workshop30 Considerations V A long term high-level requirement for the analysis: “local” Grid Services on a laptop? Reproduce part of the Grid environment locally: development of “Grid-compliant” algorithms on a disconnected node WM & DM short-circuit Development of analysis algorithms, quick iterations with the same code/tools that will run on the Grid

June, 23 rd, 2004ARDA Workshop31 Considerations VI ARDA Multi-user Distributed Analysis with or without interactivity AliEn/ROOT prototype available for batch mode gLite/PROOF shall make it available in interactive mode LCG-2 “as it is” not suitable for analysis LCG-x backward compatibility is welcome until possible, but the evolution to gLite should give priority to the implementation of new functionality for interactive analysis in a WEB service-based design

June, 23 rd, 2004ARDA Workshop32 Conclusions the DC is progressing with the help of all parties involved AliEn is fundamental, both as “Grid” and as “Meta-Grid” to LCG LCG support & performance is good LCG-2 shows some instabilities, but we are really pushing it! Heavy testing of the LCG-SE may bring more surprises, as large parts of it are new LCG-2 really looks and feels as a first-generation Grid Very good for getting experience Needing a new fresh start to build a production-grade system Hope to test the first ARDA prototype in late summer/autumn

June, 23 rd, 2004ARDA Workshop33

June, 23 rd, 2004ARDA Workshop34 Period (milestone) Fraction of the final capacity (%) Physics Objective 06/01-12/011% pp studies, reconstruction of TPC and ITS 06/02-12/025% First test of the complete chain from simulation to reconstruction for the PPR Simple analysis tools Digits in ROOT format 01/04-06/0410% Complete chain used for trigger studies Prototype of the analysis tools Comparison with parameterised MonteCarlo Simulated raw data 05/05-07/05TBD Refinement of jet studies Test of new infrastructure and MW TBD 01/06-06/0620% Test of the final system for reconstruction and analysis ALICE Physics Data Challenges NEW

June, 23 rd, 2004ARDA Workshop35 Why a new DC? We cannot stay 18 months without testing our “production capabilities” In particular we have to maintain the readiness of: Code (AliRoot + MW) ALICE distributed computing facilities LCG infrastructure Human “production machinery” Getting all the partners into “production mode” was a non- negligible effort We have to plan carefully size and physics objectives of this data challenge

June, 23 rd, 2004ARDA Workshop36 ALICE Offline Timeline ALICE PDC04 Analysis PDC04 Design of new components Development of new components Pre-challenge ‘06 PDC06 preparation PDC06 Final development of AliRoot First data taking preparation PDC04 PDC06 AliRoot ready PDC06 AliRoot ready

June, 23 rd, 2004ARDA Workshop37 Storage problems During most of last year 30TB of staging at CERN were in the plans for ALICE This is what we requested When we started the DC we only had 2-3TB LCG has been very efficient in finding additional disks However we had to “throttle” production because otherwise we would have stopped anyway And we had to be very “creative” in swapping disk servers around