ALICE and LCG Stefano Bagnasco I.N.F.N. Torino

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
During the last three years, ALICE has used AliEn continuously. All the activities needed by the experiment (Monte Carlo productions, raw data registration,
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
L ABORATÓRIO DE INSTRUMENTAÇÃO EM FÍSICA EXPERIMENTAL DE PARTÍCULAS Enabling Grids for E-sciencE Grid Computing: Running your Jobs around the World.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
IST E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA Raquel Pezoa Universidad.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Prototyping production and analysis frameworks for LHC experiments based on LCG, EGEE and INFN-Grid middleware CDF: DAG and Parametric Jobs ALICE: Evolution.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Care and feeding of the alice grid Torino, Jan 15-16, 2009.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagnasco, L. Betev, D. Goyal, A. Grigoras, C.
EGRID Project: Experience Report Implementation of a GRID Infrastructure for the Analysis of Economic and Financial data.
Grid and Cloud Computing
WLCG IPv6 deployment strategy
Gri2Win: Porting gLite to run under Windows XP Platform
Workload Management Workpackage
INFNGRID Technical Board, Feb
Grid Computing: Running your Jobs around the World
ARDA-ALICE activity in 2005 and tasks in 2006
The EDG Testbed Deployment Details
ATLAS Use and Experience of FTS
Design rationale and status of the org.glite.overlay component
GDB 8th March 2006 Flavia Donno IT/GD, CERN
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Comparison of LCG-2 and gLite v1.0
INFN-GRID Workshop Bari, October, 26, 2004
ALICE Physics Data Challenge 3
Grid2Win: Porting of gLite middleware to Windows XP platform
Introduction to Grid Technology
The GENIUS portal Roberto Barbera University of Catania and INFN
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
MC data production, reconstruction and analysis - lessons from PDC’04
Short update on the latest gLite status
A Messaging Infrastructure for WLCG
Installation and Commissioning of ALICE VO-BOXES and AliEn Services
Simulation use cases for T2 in ALICE
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
Gri2Win: Porting gLite to run under Windows XP Platform
LCG middleware and LHC experiments ARDA project
Data Management cluster summary
The GENIUS portal and the GILDA t-Infrastructure
Servizi di Grid e impatto sulla rete
Site availability Dec. 19 th 2006
The LHCb Computing Data Challenge DC06
Presentation transcript:

ALICE and LCG Stefano Bagnasco I.N.F.N. Torino Tier-2 Referee Meeting Torino Nov 28, 2005 ALICE and LCG Stefano Bagnasco I.N.F.N. Torino www.eu-egee.org EGEE is a project funded by the European Union under contract IST-2003-508833

What do we need to do? Configure, submit and track jobs User interface with massive production support Job DB (Production and user) Job monitoring Install software on sites Package Managers Distribute and execute (possibly interactive) jobs Workload Management System (Broker, L&B,…) Computing Element software Information Services Interactive analysis jobs Store and catalogue data Data catalogues (file, replica, metadata, local,…) Storage Element software Move data around File Trasfer services and schedulers Access data files I/O services Prestaging services Monitor all that stuff Monitoring infrastructure Sensors Presentation ..on top of that: Enforce security! Tier 2 Referee Meeting  Torino November 28, 2005 2

The purveyors Tier 2 Referee Meeting  Torino November 28, 2005 3

AliEn “ALIce ENvironment” started as early as 2000, using as much as possible existing, open-source, standard technology (<5% “native” code) Core services Task Queue Data & Metadata Catalogue Job & Transfer optimizers Monitoring (MonALISA from Caltech) Stand-alone site services CE, SE, FTD, network proxy (ClusterMonitor), monitoring producers PackMan,… Tier 2 Referee Meeting  Torino November 28, 2005 4

Towards integration Several Grid infrastructures are available: LCG, INFNGRID, possibly others, maybe in the U.S. Lots of resources but, in principle, different middlewares Pull-model is well-suited for implementing higher-level submission systems, since it does not require knowledge about the periphery, that may be very complex: “A Grid is a system that […] coordinates resources that are not subject to centralized control […] using standard, open, general-purpose protocols and interfaces […] to deliver nontrivial qualities of service.” I. Foster “What is the Grid? A three Point Checklist” Grid Today (2001) Tier 2 Referee Meeting  Torino November 28, 2005 5

Boundary conditions Many tools not yet industrial-strength Or even not existing… ALICE will use a single catalogue Not one per grid! Lots of data already registered there LCG requires access through “official” common software stack WMS Monitoring AliEn/gLite integration a sociologist’s dream But a physicist’s nightmare… Some services interoperable, some a bad case of round hole and square peg Tier 2 Referee Meeting  Torino November 28, 2005 6

The first version – “Interface sites” AliEn CE LCG UI AliEn SE LCG RB Job submission Server LCG Site EDG CE LCG SE Status report grid012.to.infn.it WN AliEn The philosophy: A whole Grid (namely, a Resource Broker) is seen by the server as a single AliEn CE, and the whole Grid storage is seen as a single, large AliEn SE. Data Registration Data Catalogue Replica Catalogue Data Registration Tier 2 Referee Meeting  Torino November 28, 2005 7

PDC 2004 Phase 1: Production of RAW and shipment to CERN Large output files (up to 1GB/event in ~25 files) 1a: Central events (long jobs, large files) 1b: Peripheral events (short jobs, smaller files) Alice::Torino::LCG Interface to INFNGRID Alice::CERN::LCG Interface to LCG-2 Tier 2 Referee Meeting  Torino November 28, 2005 8

Lessons learned LCG Workload Management proved itself slow but reasonably robust We never lost a job due because of RB failures The remote site configuration was the major source of problems, LCG-side. And got worse with the use of LCG storage… Software management tools were (and still are) rudimentary Tier-1s have often tighter security restrictions & other idiosincracies Investigating and fixing problems is hard and time-consuming The most difficult part of the management is monitoring LCG through a “keyhole”. Only integrated information available natively MonALISA for AliEn, GridICE for LCG Some safety mechanisms are too coarse for this approach (queue blocking) Anti-blackhole mechanisms needed Based on run time statistics, CPUTime/WallClockTime ratio… Or remove the problem by using Job Agents For shorter jobs, submission time (and thus the interface system performance) can limit the number of jobs But the system is trivially scalable Tier 2 Referee Meeting  Torino November 28, 2005 9

Recent developments JobAgents Distributed data catalogue Following LHCb example Create a safe playpen before starting real job Distributed data catalogue Core catalogue is a StorageIndex (to be compatible with RB) Local catalogues may have different flavours (including LFC) Tighter integration with root Xrootd replaces aiod and gLite I/O (will run on top of SRM/DPM/…) Proof for interactive analysis Better security Apache service containers All communications secure (still need some logging) LCG-compatible user authentication Tier 2 Referee Meeting  Torino November 28, 2005 10

The VO-Box New concept: VO-Specific services Also called “Edge services” The ALICE VO-Box software was developed by SB and Pablo Saiz in Torino and at CERN “Thin” and “virtual” services Minimise impact on site operations Minimise duplication of functionality and code Fast-changing service, in production environment “Extreme programming” paradigm: release early, release often Tune the services in production environment Don’t waste CPUs (and manpower): merge Service Challenge and Data Challenge efforts Tier 2 Referee Meeting  Torino November 28, 2005 11

The VO-Box cont’d Interface services… …and some added functionality “CE” interface between TQ and RB “SE” interface to SRM (via xrootd) and LFC “FTD” interface to FTS …and some added functionality ClusterMonitor to proxy communications between WNs and the core services (reduced need for outbound connectivity) PackMan for software management (no common replacement currently exists) Agents monitoring Tier 2 Referee Meeting  Torino November 28, 2005 12

Current version: site VO-Box LCG RB Job submission TQ VO-Box LCG Site File Catalogue CE EDG CE LCG SE SE LFN Registration Request configuration PackMan WN JobAgent The philosophy: A set of interface services is run on a dedicated machine at each site, (mostly) providing common interfaces to underlying site services PFN Registration LFC Tier 2 Referee Meeting  Torino November 28, 2005 13

Data Access model VO-Box FTS SERVER LCG Site CE xrootd SE SE SRM File Catalogue VO-Box LCG Site CE xrootd SE (DPM) SE SRM PackMan xrootd GridFTP WN JobAgent LFC Tier 2 Referee Meeting  Torino November 28, 2005 14

What do we do? Configure, submit and track jobs User interface with massive production support Job DB (Production and user) Job monitoring Install software on sites Package Managers Distribute and execute jobs Workload Management System (Broker, L&B,…) Computing Element software Information Services Interactive analysis jobs Store and catalogue data Data catalogues (file, replica, metadata, local,…) Storage Element software Move data around File Trasfer services and schedulers Access data files I/O services Prestaging services Monitor all that stuff Monitoring infrastructure Sensors Presentation ..on top of that: Enforce security! MIXED COMMON ? COMMON MIXED MonALISA PROOF Tier 2 Referee Meeting  Torino November 28, 2005 15

LCG as seen from ALICE Tier 2 Referee Meeting  Torino November 28, 2005 16

PDC2005 (and LCG SC3) Better monitoring Can gather info from single sites without any special trick Better use of local site services At the expense of some abstraction Locally optimised JobAgent submission prevents “JobAgent” floods But things can still be improved… Tier 2 Referee Meeting  Torino November 28, 2005 17

Future developments Submission service Interactive analysis Was a geographic service, it’s now local but can be “virtualized” Or better to have a broker “plugin” pulling directly from TQ? Interactive analysis Need very efficient storage access, much work still to be done Root →proof→xrootd→SRM→DPM But minimizing number of VO-specific services! Very fast development In a nearly-production environment Torino is currently the “development” site for VO-Box and site services Tier 2 Referee Meeting  Torino November 28, 2005 18