Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.

Slides:



Advertisements
Similar presentations
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Advertisements

Status Report University of Bristol 3 rd GridPP Collaboration Meeting 14/15 February, 2002Marc Kelly University of Bristol 1 Marc Kelly University of Bristol.
Partner Logo Tier1/A and Tier2 in GridPP2 John Gordon GridPP6 31 January 2003.
Presenter Name Facility Name EDG Testbed Status Moving to Testbed Two.
LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.
Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
London Tier 2 Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
1 Dynamic Application Installation (Case of CMS on OSG) Introduction CMS Software Installation Overview Software Installation Issues Validation Considerations.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
Steve Traylen Particle Physics Department EDG and LCG Status 9 th December 2003
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
19th September 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Royal Holloway 19 th September 2003.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Martin Bly RAL Tier1/A Centre Preparations for the LCG Tier1 Centre at RAL LCG CERN 23/24 March 2004.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
Tier1A Status Andrew Sansum 30 January Overview Systems Staff Projects.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
RAL Site report John Gordon ITD October 1999
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
DataTAG Work Package 4 Meeting Bologna Simone Ludwig Brunel University 23rd and 24th of May 2002.
Partner Logo A Tier1 Centre at RAL and more John Gordon eScience Centre CLRC-RAL HEPiX/HEPNT - Catania 19th April 2002.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Tier1A Status Martin Bly 28 April CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:
BaBar Cluster Had been unstable mainly because of failing disks Very few (
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
LHCb Grid MeetingLiverpool, UK GRID Activities Glenn Patrick Not particularly knowledgeable-just based on attending 3 meetings.  UK-HEP.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Quattor installation and use feedback from CNAF/T1 LCG Operation Workshop 25 may 2005 Andrea Chierici – INFN CNAF
WLCG IPv6 deployment strategy
The EDG Testbed Deployment Details
NL Service Challenge Plans
LCG Deployment in Japan
UK GridPP Tier-1/A Centre at CLRC
Presentation transcript:

Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003

Martin Bly RAL Tier1/A Contents GRID Stuff – clusters and interfaces Hardware and utilisation Software and utilities

Martin Bly RAL Tier1/A Layout

Martin Bly RAL Tier1/A EDG Status EDG 2.0.x deployed on production test-bed since early September. Provides: –EDG RGMA info catalogue –RLS for lhcb, biom, eo, wpsix, tutor and Babar EDG 2.1 deployed on dev test-bed. VOMS integration work underway. May be found useful by small GRIDPP experiments (eg NA48, MICE and MINOS) EDG 2.0 gatekeeper provides gateway into main CSF production farm. Provides access for some of Babar and ATLAS work. Being prepared for forthcoming D0 production via SAMGrid Along with IN2P3, CSFUI provides main UI for EDG Many WP3 and WP5 mini test-beds Further GRID integration into production farm via LCG – not EDG

Martin Bly RAL Tier1/A LCG Integration LCG-0 mini test-bed deployed March LCG-1 test-bed deployed in July LCG 1 upgraded to LCG1-1_0_1 in August/September. Consists of: –Lcgwest regional GIIS –RB, CE, SE, UI, BDII, PROXY, 5*WN WN = 2*1GHz/1GB RAM, SE = 540GB Soon need to make important decisions about how much hardware to deploy into LCG – driven by what the Experiment Board want. Issues: –Installation and configuration still difficult for non-experts. –Documentation still thin in many places. –Support often very helpful but answers not always forthcoming for some problems. –Not everything works – all of the time. Beginning to discuss internally how to interoperate with production farm.

Martin Bly RAL Tier1/A SRB Service for CMS SDSC Storage Resource Broker SRB MCAT for whole CMS production. Consists of enterprise class ORACLE servers and “thin” MCAT ORACLE client. SRB interface into Datastore SRB enabled disk server to handle data imports. SRB clients on disk servers for data moving Needed some work to deploy Very good support from developers SDSC ADS interface integrated into main SRB source Considerable learning experience for Datastore team (and CMS)!

Martin Bly RAL Tier1/A P4 Xeon Experiences Disappointing performance with gcc –Hope for 2.66P4/1.4P3=1.5 –see Can obtain more by exploiting hyper-threading but Linux CPU scheduling causes difficulties (ping-pong effects) Performance better with Intel Compiler Efforts to run `0(1)’ scheduler unsuccessful CPU accounting now depends on number of jobs running. Beginning to look closely at Opteron solutions.

Martin Bly RAL Tier1/A Datastore Upgrade STK 9310 robot, 6000 slots –IBM 3590 drives being phased out (10GB 10MB/Sec) –STK 9940B drives in production (200GB 30MB/sec) 4 IBM 610+ servers with two FC connections and Gbit networking on PCI-X –9940 drives FC connected via 2 switches for redundancy –SCSI raid 5 disk with hot spare for 1.2Tbytes cache space

Martin Bly RAL Tier1/A Switch_1Switch_2 RS6000 fsc0fsc1 fsc0 9940B fsc1fsc0fsc1fsc rmt1 rmt4rmt3rmt2 rmt5-8 AAAAAAAA STK 9310 “Powder Horn” Gbit network 1.2TB

Martin Bly RAL Tier1/A Operating Systems Redhat 6.2 closed end of August (Babar build-box) Redhat 7.2 –Babar 7.2 service migrated to Redhat 7.3 during October. –Residual `bulk’ batch service closing soon. –Three front-ends for Babar. Redhat 7.3 –Service now main workhorse for LHC experiments and Babar batch work. –`Bulk’ service opening soon. –Three front-ends. –LCG-1 Need to start looking at what to do next (Fedora, Debian, RH-ES/AS, …)! Need to deploy Redhat Advanced Server 

Martin Bly RAL Tier1/A Next Procurement Based on experiments expected demand profile (as best they can estimate). Exact numbers still being finalised, but about: –250 dual processor CPU nodes –70TB available disk –100TB tape

Martin Bly RAL Tier1/A CPU Requirements (KSI2K)

Martin Bly RAL Tier1/A

Martin Bly RAL Tier1/A New Helpdesk Need to deploy new helpdesk (had Remedy). Wanted: –Web based. –Free open source. –Multiple queues and personalities. Looked at Bugzilla, OTRS and RequestTracker. Finally selected RequestTracker. Available for other Tier 2 sites and other GRIDPP projects if needed.

Martin Bly RAL Tier1/A

Martin Bly RAL Tier1/A YUMIT: RPM Monitoring Hundreds of nodes on the farm. Need to make sure RPMs are up to date. Wanted light-weight solution until full fabric management tools are deployed. Package written by Steve Traylen: –Yum installed on all systems –Nightly comparison with YUM database uploaded to MYSQL server. –Simple web based display utility in perl

Martin Bly RAL Tier1/A

Martin Bly RAL Tier1/A

Martin Bly RAL Tier1/A Exception Monitoring: Nagios Already have an exception handling system (CERN’s SURE coupled with the commercial Automate). Looking at alternatives – no firm plans yet but currently looking at NAGIOS:

Martin Bly RAL Tier1/A

Martin Bly RAL Tier1/A Summary: Outstanding Issues Many new developments and new services deployed this year. We have to run many distinct services. For example, FERMI Linux, RH 7.2/7.3, EDG testbeds, LCG, CMS DC03, SRB etc. Waiting to hear when the experiments want LCG in volume. The Pentium 4 processor is performing poorly. Redhat’s changing policy is a major concern