Tier1 Andrew Sansum GRIDPP 10 June 2004. GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP (2001-2004). –“ GridPP will enable testing.

Slides:



Advertisements
Similar presentations
S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
Advertisements

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Partner Logo Tier1/A and Tier2 in GridPP2 John Gordon GridPP6 31 January 2003.
Level 1 Components of the Project. Level 0 Goal or Aim of GridPP. Level 2 Elements of the components. Level 2 Milestones for the elements.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.
Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
 Changes to sources of funding for computing in the UK.  Past and present computing resources.  Future plans for computing developments. UK Status &
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
Tier 1A Storage Procurement 2001/2002 Andrew Sansum CLRC eScience Centre.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
Jeremy Coles - RAL 17th May 2005Service Challenge Meeting GridPP Structures and Status Report Jeremy Coles
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
RAL Tier 1 Site Report HEPSysMan – RAL – May 2006 Martin Bly.
1 st EGEE Conference – April UK and Ireland Partner Dave Kant Deputy ROC Manager.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
Andrew McNabNorthGrid, GridPP8, 23 Sept 2003Slide 1 NorthGrid Status Andrew McNab High Energy Physics University of Manchester.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
HEPix April 2006 NIKHEF site report What’s new at NIKHEF’s infrastructure and Ramping up the LCG tier-1 Wim Heubers / NIKHEF (+SARA)
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
J Coles eScience Centre Storage at RAL Tier1A Jeremy Coles
Dave Newbold, University of Bristol8/3/2001 UK Testbed 0 Sites Sites that have committed to TB0: RAL (R) Birmingham (Q) Bristol (Q) Edinburgh (Q) Imperial.
…building the next IT revolution From Web to Grid…
Martin Bly RAL Tier1/A Centre Preparations for the LCG Tier1 Centre at RAL LCG CERN 23/24 March 2004.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
Tier1A Status Andrew Sansum 30 January Overview Systems Staff Projects.
RAL Site report John Gordon ITD October 1999
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Partner Logo A Tier1 Centre at RAL and more John Gordon eScience Centre CLRC-RAL HEPiX/HEPNT - Catania 19th April 2002.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Tier1 Status Report Andrew Sansum Service Challenge Meeting 27 January 2004.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Tier1A Status Martin Bly 28 April CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:
RAL Site Report Martin Bly SLAC – October 2005.
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.
SAM at CCIN2P3 configuration issues
UK GridPP Tier-1/A Centre at CLRC
The INFN TIER1 Regional Centre
Tier-1 Status Progress and Difficulties A View from RAL
GridPP Tier1 Review Fabric
Presentation transcript:

Tier1 Andrew Sansum GRIDPP 10 June 2004

GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing of a prototype Grid of significant scale, providing resources for the LHC experiments ALICE, ATLAS, CMS and LHCb, the US-based experiments BaBar, CDF and D0, and lattice theorists from UKQCD” –Tier1 provides access to large scale compute resources for experiments –Tier A service for Babar physics analysis –LHC data challanges –Support wide range of prototype PP GRID software (eg Certificate Auth) –Close involvement in European Datagrid Project EDG (many testbeds) GRIDPP2 ( ) –“From Prototype to Production” –Close engagement in LCG project, preparing for LHC startup –Continue to provide Tier A centre for Babar –EGEE Resource and member of EGEE Testbed –Ramp-up to a production quality GRID service –Gradually move to only GRID access

GRIDPP10 June 2004Tier1A3 Tier1 in GRIDPP2 ( ) The Tier-1 Centre will provide GRIDPP2 with a large computing resource of a scale and quality that can be categorised as an LCG Regional Computing Centre January 2004 – GRIDPP2 confirm RAL to host Tier1 Service –GRIDPP2 to commence September 2004 Tier1 Hardware budget: –£2.3M over 3 years Staff –Increase from 12.1 to 13.5(+3 CCLRC) by September

GRIDPP10 June 2004Tier1A4 So What Exactly is a Tier1 The Tier1 will differentiate itself from the Tier2s by: Providing data management at high QoS able to host Primary/Master copies of data Providing state-of-the-art network bandwidth Contributing to collaborative services/core infrastructure Providing high quality technical support Responding rapidly to service faults Being able to make long term service commitments

GRIDPP10 June 2004Tier1A5 Tier1 Staffing Manage Project/Planning/Policy/Finance.. Disks Severs/Filesystems CPU Farms Farm systems Tapes Robot and Interfaces Core Critical systems.. Oracle/Mysql/AFS/Home/Monitoring.. 1FTE2FTE1.5FTE2.5FTE Operations Machine rooms/Tape ops/Interventions… Network Site Infrastructure/Tier1 LAN Support Support Experiments and their services Deploy Tier1 and UK GRID interfaces Hardware Fix systems/Hardware support

GRIDPP10 June 2004Tier1A6 Current Tier1 Hardware CPU –350 dual Processor Intel – PIII and Xeon servers mainly rack mounts –About 400KSI2K Disk Service – mainly “standard” configuration –Dual Processor Server –Dual channel SCSI interconnect –External IDE/SCSI RAID arrays (Accusys and Infortrend) –ATA drives (mainly Maxtor) –About 80TB disk –Cheap and (fairly) cheerful Tape Service –STK Powderhorn 9310 silo with B drives

GRIDPP10 June 2004Tier1A7 Network Firewall Site Router Production SubnetTest Subnet Superjanet Servers Workers Test network (eg MBNG) Server Servers WorkersProductionVLAN TestVLAN SiteRoutableNetwork Rest of Site

GRIDPP10 June 2004Tier1A8 Network Firewall Site Router Tier1 Network Superjanet Servers Workers Test network (eg MBNG) Server Servers Workers TestVLAN ProductionVLAN Rest of Site

GRIDPP10 June 2004Tier1A9 UKlight Connection to RAL in September Funded to end 2005 after which probably merges with SuperJanet 5 2.5Gb/s now  10Gb/s from 2006 Effectively dedicated lightpath to CERN Probably not for Tier1 production but suitable for LCG Data challenges etc, building experience for Superjanet upgrade.

GRIDPP10 June 2004Tier1A10 New Hardware Arrives 7 th June CPU Capacity (500 KSI2K) –256 dual processor 2.8GHz Xeons –2/4GB Memory –120GB HDA Disk Capacity (140TB) –Infortrend Eonstore SATA/SCSI RAID Arrays –16*250GB Western Digital SATA per array –Two arrays per server

GRIDPP10 June 2004Tier1A11 Planned Ramp up

GRIDPP10 June 2004Tier1A12 Next Delivery Need in production by end of year –Original schedule of December delivery seems late –Will have to start very soon –Less chance for testing / new technology Exact proportions not agreed, but … –400 KSI2K ( CPUs) –160TB disk –120TB tape?? –Network infrastructure? –Core servers (H/A??) –Redhat? Long range plan needs reviewing – also need long range experiment requirements

GRIDPP10 June 2004Tier1A13 CPU Capacity

GRIDPP10 June 2004Tier1A14 Tier1 Disk Capacity (TB)

GRIDPP10 June 2004Tier1A15 Forthcoming Challanges Simplify service – less “duplication” Improve storage management Deploy new Fabric Management Redhat Enteprise 3 upgrade Network upgrade/reconfigure???? Another procurement/install Meet challenge of LCG – professionalism LCG Data Challenges …

GRIDPP10 June 2004Tier1A16 Clean up Spaghetti Diagram Simplify Interfaces: Less GRIDS “More is not always better” How to phase out “Classic” service..

GRIDPP10 June 2004Tier1A17 Storage: Plus and Minus ATA and SATA drives External RAID arrays SCSI interconnect Ext2 filesystem Linux O/S NFS/Xrootd/http/gridftp/bbftp/srb/…. NO SAN No management layer NO HSM 2.5% failure per annum - OK Good architecture, choose well Surprisingly unreliable: change OK – but need journal: XFS? Move to Enterprise 3 Must have SRM Need SAN (Fibre or iSCSI …) Need virtualisation/DCACHE.. ????

GRIDPP10 June 2004Tier1A18 Fabric Management Currently run: –Kickstart – cascadingconfig files –SURE exception monitoring –Automate – automatic interventions Running out of steam with old systems … –“Only” 800 systems – but many, many flavours –Evaluating Quator – no obvious alternatives – probably deploy –Less convinced by Lemon – bit early – running Nagios in parallel

GRIDPP10 June 2004Tier1A19 Conclusions After several years of relative stability must start re-engineering many Tier1 components. Must start to rationalise – support limited set of interfaces, operating systems, testbeds … simplify so we can do less better LCG becoming a big driver –Service commitments –Increase resilience and availability –Data challenges and move to steady state Major reality check in 2007!