Quattor installation and use feedback from CNAF/T1 LCG Operation Workshop 25 may 2005 Andrea Chierici – INFN CNAF

Slides:



Advertisements
Similar presentations
LNL CMS M.Biasotto, Bologna, 29 aprile LNL Analysis Farm Massimo Biasotto - LNL.
Advertisements

German Cancio – WP4 developments Partner Logo WP4-install plans WP6 meeting, Paris project conference
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
EGEE is a project funded by the European Union under contract IST Quattor Installation of Grid Software C. Loomis (LAL-Orsay) GDB (CERN) Sept.
INFSO-RI Enabling Grids for E-sciencE Status of LCG-2 porting Stephen Childs, Brian Coghlan and Eamonn Kenny Grid-Ireland/EGEE October.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
CERN – Roberta Faggian Marque, Jan Fiete Grosse-Oetringhaus GRACE General Meeting, September 2004, Brussels 1 D6.1 Integration with the European DataGrid.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks perfSONAR deployment over Spanish LHC Tier.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Fabric Monitor, Accounting, Storage and Reports experience at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Workshop sul.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
Ariel Garcia LCG cluster installation, EGEE training, Ariel Garcia - IWR LCG Cluster Installation Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft.
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
EGEE is a project funded by the European Union under contract IST Tools survey status, first experiences with the prototype Diana Bosio EGEE.
Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)
LCG workshop on Operational Issues CERN November, EGEE CIC activities (SA1) Accounting: current status
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE YAIM Overview MiMOS Grid tutorial HungChe, ASGC OPS Team.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
Linux Configuration using April 12 th 2010 L. Brarda / CERN (some slides & pictures taken from the Quattor website) ‏
CERN 19/06/2002 Kickstart file generator Andrea Chierici (INFN-CNAF) Enrico Ferro (INFN-LNL) Marco Serra (INFN-Roma)
TP: Grid site installation BEINGRID site installation.
INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.
II EGEE conference Den Haag November, ROC-CIC status in Italy
1/3/2006 Grid operations: structure and organization Cristina Vistoli INFN CNAF – Bologna - Italy.
INFSO-RI Enabling Grids for E-sciencE Quattor Workshop Summary C. Loomis (LAL-Orsay) GDB Meeting (Rome) April 5, 2006.
Farming Andrea Chierici CNAF Review Current situation.
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
CERN IT Department CH-1211 Genève 23 Switzerland M.Schröder, Hepix Vancouver 2011 OCS Inventory at CERN Matthias Schröder (IT-OIS)
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
INFN-T1 migration to scdb Andrea Chierici 8 th Quattor Workshop Bruxelles.
CERN Openlab Openlab II virtualization developments Havard Bjerke.
Davide Salomoni INFN-CNAF Bologna, Jan 12, 2006
The EDG Testbed Deployment Details
IBCP - CNRS STATUS Christelle Eloto Lyon - France
UAM status report Luis Fernando Muñoz Mejías
StoRM: a SRM solution for disk based storage systems
Barbara Martelli INFN - CNAF
NA4/medical imaging. Medical Data Manager Installation
Status of Fabric Management at CERN
Andreas Unterkircher CERN Grid Deployment
Brief overview on GridICE and Ticketing System
Accounting at the T1/T2 Sites of the Italian Grid
INFN – GRID status and activities
Status and plans of central CERN Linux facilities
Partner Status HPCL-University of Cyprus
Testing for patch certification
STORM & GPFS on Tier-2 Milan
Quattor Usage at Nikhef
Testing Activities on the CERT-TB Status report
Report on GLUE activities 5th EU-DataGRID Conference
Porting LCG to IA64 Andreas Unterkircher CERN openlab May 2004
Quattor Advanced Tutorial, LAL
The EU DataGrid Fabric Management Services
Installation/Configuration
Presentation transcript:

Quattor installation and use feedback from CNAF/T1 LCG Operation Workshop 25 may 2005 Andrea Chierici – INFN CNAF

25 may 2005Andrea Chierici - CNAF/T12 Introduction Location: INFN-CNAF, Bologna (Italy) Computing facility for INFN HNEP community –Partecipating to LCG, EGEE, INFNGRID projects Multi-Experiment TIER1 –LHC experiments –VIRGO –CDF –BABAR –AMS, MAGIC, ARGO,...

25 may 2005Andrea Chierici - CNAF/T13 Computing facility 800 linux boxes with different hardware –Cpus: PIII, xeon, opteron –HDs: scsi, eide, sata (connected in many ways) –OSs: redhat 7.3 (phasing out), SLC Different targets –LCG farm –Babar and CDF own farms Special requirements

25 may 2005Andrea Chierici - CNAF/T14 Looking for an installation tool All this heterogeneity requires a robust installation tool –DataGrid collaboration produced lcfg as an interim solution –Quattor is the final product of WP4 effort Took the basic ideas from lcfg Improved architecture Reliable Supported mainly by CERN and somehow by other sites (LAL, NIKHEF, CNAF, …)

25 may 2005Andrea Chierici - CNAF/T15 Quattor within Since release 1.0.0, this is the one and only tool used to install new nodes –Used different “layers” of installation: Vanilla for general purpose nodes (e.g. babar nodes) Quattor-gdb for lcg machines (LCG CE, WN, UI, etc) Soon: support for different hw architectures (i386, x86_64)

25 may 2005Andrea Chierici - CNAF/T16 Components used grub nfs ldconf accounts authconfig afs ntp chkconfig altlogrotate globuscfg cmnconfig rm dirperm filecopy profile edglcg rgmaclient gridmapdir

25 may 2005Andrea Chierici - CNAF/T17 Advantages After a slow start phase, now fully in production Easily portable to different OS versions (rpm based) All nodes install with kickstart, no hw support problem (compared to lcfg) Idempotent node configuration –We are sure the configuration is exactly the one we want (compared to lcfg) Very robust and scalable (by its architecture), based on standard protocols and programming languages (http, tftp, dhcp, xml, perl)

25 may 2005Andrea Chierici - CNAF/T18 What is missing Kickstart generation is incomplete –Partitioning is hard-coded –Impossible to specify ad hoc post-installation script Organization of templates in subdirectories –Hundreds of templates to manage –Something done by Cal, but not included in standard release (subversion based)

25 may 2005Andrea Chierici - CNAF/T19 Problems (1) Best effort development and support Version outdated, several improvements done, required a new tag Very slow to compile many templates –More than 2 hours to upgrade the 600 nodes of the LCG farm now managed (even to add a single rpm) Already fixed (now it takes just some minutes) Quite hard to “grasp” –Documentation clear but not sufficient –Development of new components requires OO perl

25 may 2005Andrea Chierici - CNAF/T110 Problems (2) LCG does not support quattor directly For LCG deployment we depend on Cal –If he wins the lottery, right now no one is able to take over his job –yaim is the only “blessed” tool by LCG CERN uses yaim on quattor We did not like the mixed solution and decided to use Cal’s release (always possible to change) –Experiencing some problems with VOMS and GridICE configuration

25 may 2005Andrea Chierici - CNAF/T111 Man power required Currently 1 FTE to manage all the installations Starting phase quite hard –Learn a little of pan –Understand the templates structure When in production, depending on the updates required, may require less effort

25 may 2005Andrea Chierici - CNAF/T112 Conclusions Quattor rocks The TOOL used on our farms –Immediately integrated in our pre-existing installation infrastructure (based on lcfg) –Can be implemented with just 1 server –Suggested when the number of nodes goes over 50 units or when dealing with different farms Some concerns about the future –LCG –Glite –Generic support “CERN-dependent”