1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.

Slides:



Advertisements
Similar presentations
Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 GridPP 30, Glasgow, 26th March 2013.
Advertisements

Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Quattor Update Aquilon & the Quattor Community Ian Collier, James Adams STFC RAL Tier 1 HEPiX, Annecy, 22nd May 2014.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
EGEE is a project funded by the European Union under contract IST Quattor Installation of Grid Software C. Loomis (LAL-Orsay) GDB (CERN) Sept.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
Ian Bird LHCC Referee meeting 23 rd September 2014.
LAL Site Report Michel Jouvin LAL / IN2P3
RAL Site Report Castor F2F, CERN Matthew Viljoen.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Loomis (CNRS/LAL) M.-E. Bégin (SixSq.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
UK middleware deployment GridPP27 - CERN 15 th September 2011 GridPP27 - CERN 15 th September 2011 Status & plans Jeremy Coles.
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
WLCG GridKa+T2s Workshop Site Report --- Presenter, Site, Country.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stuart Kenny and Stephen Childs Trinity.
HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Online System Status LHCb Week Beat Jost / Cern 9 June 2015.
QWG Errata Management Framework Ian Collier 10 th Quattor Workshop Rutherford Appleton Laboratory October 2010.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
Virtualisation at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Annecy, 23rd May 2014.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
RAL Site Report Martin Bly HEPiX Spring 2009, Umeå, Sweden.
HEPiX Fall 2009 Highlights Michel Jouvin LAL, Orsay November 10, 2009 GDB, CERN.
23 January 2007WLCG workshop, CERN System Management Working Group Alessandra Forti WLCG workshop CERN, 23 January 2007.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September
Disk Server Deployment at RAL Castor F2F RAL - Feb 2009 Martin Bly.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
The Great Migration: From Pacman to RPMs Alain Roy OSG Software Coordinator.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Towards an Information System Product Team.
17 September 2004David Foster CERN IT-CS 1 Network Planning September 2004 David Foster Networks and Communications Systems Group Leader
Scientific Linux Connie Sieh CSAM Meeting May 2, 2006.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Status of the SL5 migration ALICE TF Meeting
C Loomis (CNRS/LAL) and V. Floros (GRNET)
Cloud Challenges C. Loomis (CNRS/LAL) EGI-TF (Amsterdam)
UAM status report Luis Fernando Muñoz Mejías
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Quattor Usage at Nikhef
GridPP Tier1 Review Fabric
Presentation transcript:

1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell

Quattor Update - HEPiX Fall 2010, Cornell 2 Quattor Update - Contents Quattor at RAL Status Plans Quattor Community Developments 10th Quattor Workshop Process & Involvement

Quattor Update - HEPiX Fall 2010, Cornell 3 Quattor at RAL - Status: Clusters In May 2009 after examining options and alternatives, the decision was made to adopt Quattor as the Fabric Management System for RAL Tier1 A challenge to apply to a running Tier1 with out downtime In September 2009 we deployed our new SL5 batch service, along with a new batch server managed by Quattor Early this year deployed the first SL4 disk servers managed by Quattor And later all new 64-bit SL5 disk servers At the same time we have been migrating all of our grid services and core services to Quattor management And finally we are working on the Castor servers So, we acheived rather a lot, the sky did not fall. The Tier1 is still running, we learned a lot, ended up attending many workshops and meetings, even hosting a workshop - not part of the original plan!

Quattor Update - HEPiX Fall 2010, Cornell 4 Quattor at RAL - Status: Clusters Switched SL 5 batch farm just 14 months ago Now fully commited to Quattor and reaping benefits Closed SL4 service in August and reinstalled remainder So all 700 or so WNs Saving ~ 1/3-1/2 FTE on batch farm management Since start of the year all new disk servers tested and then deployed with Quattor About to migrate all older disk servers to Quattor Need to upgrade to latest 64 bit Castor - much easier this way Very carefully testing that we really can preserve data, even for some of the older raid cards etc.

Quattor Update - HEPiX Fall 2010, Cornell 5 Quattor at RAL - Status: Core servers Using Quattor Working Group framework for all supported grid services nodes seeing really significant benefits in the shared effort bdiis, cream-ce, vo-boxes etc., etc. RAL is the first Tier 1 using QWG - our use cases are a bit different sometimes, which has taken a bit or work... For machine types not yet supported by QWG configuring OS and standard site parts and then using YAIM for the gLite parts LFC, FTS... Contributing back to QWG and seeing really significant benefits in the shared effort Core non-grid systems also managed with quattor Quattor server itself, plus repository servers, our helpdesk, squids, nagios servers, etc, etc

Quattor Update - HEPiX Fall 2010, Cornell 6 Quattor at RAL - Status: Castor Not yet using any of theconfiguration components developed at CERN Castor payoad already delivered on diskservers by Quattor Configured new castor tape servers which went into production in the summer Currently testing a new Castor instance for non-LHC data at RAL. Installing OS, all software including Castor, and standard site configuration with Quattor Then using existing puppet manifests for configuring Castor services Very similar to the Quattor+YAIM used for unsupported gLite machine types Investigating migration to full Quattor management next year. Some very positive discussion with Castor team at CERN about both converging on a compatible set of configurations

Quattor Update - HEPiX Fall 2010, Cornell 7 Quattor at RAL - Plans Upgrade remaining disk servers Continue support for Castor hybrid approach using existing puppet for configuration files Deploy Sindes to secure communication and allow distribution of sensitive information Improve the inventory database that we use to ‘feed’ Quattor Still planning to trial Aquilon, an SQL based configuration database Begin support for RHEL to allow management of database servers

Quattor Update - HEPiX Fall 2010, Cornell 8 Quattor Community - 10th Workshop Very successful workshop hosted at RAL in October I really noticed that there was really a lot of discussion in (almost) all sessions - a true workshop not just a bunch of presentations and site reports (in fact for the first time we had no site reports) A strong sense of a community gathering to work together on things I think it is useful to think about what allowed this (more on this later)

Quattor Update - HEPiX Fall 2010, Cornell 9 Quattor Community - 10th Workshop Last year I reported on the bid for 7th framework programme funding (QUEST) We did not get the money :( However, the process of preparing the bid concentrated our minds We built a detailed view of what we would like to do to improve Quattor - and how we might get that done We developed a much more involved team So this is perhaps one (the main?) factor in the very active and interactive workshop

Quattor Update - HEPiX Fall 2010, Cornell 10 Quattor Community - 10th Workshop Some Highlights/Actions RAL/CERN to collaborate on improving Sindes - esp the usability Considerable work on building Nagios configurations inside QWG framework Work to improve main websites and establish better outreach - for example it would be nice if when you search for configuration management tools you see something about Quattor Interesting use of Quattor in StratusLab project - see Michel’s talk Oh, and Nikhef are experimenting with QWG – so RAL is no longer the only Tier1 using the framework