SouthGrid Status Pete Gronbech: 12 th March 2008 GridPP 20 Dublin.

Slides:

Advertisements

Similar presentations

London Tier2 Status O.van der Aa. Slide 2 LT 2 21/03/2007 London Tier2 Status Current Resource Status 7 GOC Sites using sge, pbs, pbspro –UCL: Central,

Advertisements

S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.

NorthGrid status Alessandra Forti Gridpp15 RAL, 11 th January 2006.

Deployment metrics and planning (aka Potentially the most boring talk this week) GridPP16 Jeremy Coles 27 th June 2006.

1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.

Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Southgrid Status Pete Gronbech: 30 th August 2007 GridPP 19 Ambleside.

QMUL e-Science Research Cluster Introduction (New) Hardware Performance Software Infrastucture What still needs to be done.

UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010.

RAL Tier1: 2001 to 2011 James Thorne GridPP th August 2007.

Northgrid Status Alessandra Forti Gridpp24 RHUL 15 April 2010.

GridPP: Executive Summary Tony Doyle. Tony Doyle - University of Glasgow Oversight Committee 11 October 2007 Exec 2 Summary Grid Status: Geographical.

UKI-SouthGrid Overview GridPP27 Pete Gronbech SouthGrid Technical Coordinator CERN September 2011.

Jon Wakelin, Physics & ACRC Bristol. 2 ACRC Server Rooms –PTR – 48 APC water cooled racks (Hot aisle, cold aisle) –MVB – 12 APC water cooled racks (Hot.

Winnie Lacesso Bristol Site Report May Scope User Support / Servers / Config Security / Network UKI-SOUTHGRID-BRIS-HEP Upcoming: major infrastructure.

CBPF J. Magnin LAFEX-CBPF. Outline What is the GRID ? Why GRID at CBPF ? What are our needs ? Status of GRID at CBPF.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Quarterly report ScotGrid Quarter Fraser Speirs.

Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.

Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Site Report HEPHY-UIBK Austrian federated Tier 2 meeting

Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.

Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.

SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.

UKI-SouthGrid Overview Face-2-Face Meeting Pete Gronbech SouthGrid Technical Coordinator Oxford June 2013.

London Tier 2 Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney.

Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator GridPP 24 - RHUL 15 th April 2010.

Quarterly report SouthernTier-2 Quarter P.D. Gronbech.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.

Winnie Lacesso Bristol Storage June DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.

12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.

GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.

Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.

1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.

UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.

Southgrid Technical Meeting Pete Gronbech: 24 th October 2006 Cambridge.

Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.

HEPSYSMAN May 2007 Oxford & SouthGrid Computing Status (Ian McArthur), Pete Gronbech May 2007 Physics IT Services PP Computing.

Grid DESY Andreas Gellrich DESY EGEE ROC DECH Meeting FZ Karlsruhe, 22./

LCG Storage Accounting John Gordon CCLRC – RAL LCG Grid Deployment Board September 2006.

HEP SYSMAN 23 May 2007 National Grid Service Steven Young National Grid Service Manager Oxford e-Research Centre University of Oxford.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.

2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6

UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.

Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.

Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.

Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

BaBar Cluster Had been unstable mainly because of failing disks Very few (

RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.

J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.

Cambridge Site Report John Hill 20 June 20131SouthGrid Face to Face.

UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.

Update on Plan for KISTI-GSDC

Oxford Site Report HEPSYSMAN

Presentation transcript:

SouthGrid Status Pete Gronbech: 12 th March 2008 GridPP 20 Dublin

UK Tier 2 reported CPU – Historical View

UK Tier 2 reported CPU – Feb 2008 View

SouthGrid Sites Accounting as reported by APEL

RAL PPD 600KSI2K 158TB SL4 ce installed, some teething problems 80TB storage + a further 78TB which was loaned to RAL Tier 1. SRMv2.2 upgrade on the dcache se proved very tricky, space tokens not yet defined hardware upgrade purchased but not yet installed. Some kit installed in the Atlas Centre, due to power/cooling issues in R1 Two new sys admins started during the last month.

Status at Cambridge 391KSI2K 43TB 32 Intel Woodcrest servers, giving 128 cpu cores equiv. to 358 KSI2k. Jun 2007 Storage upgrade of 40TB running DPM on SL4 64 bit Plans to double storage and update CPUs Condor version is being used SAM availability high Lots of Work by Graeme and Santanu to get verified for ATLAS production, but had recent problems with long jobs failing. Now working with LHCb to solve issues Problems with Accounting, we still dont believe that the work done at Cambridge is reported correctly in the accounting.

Birmingham Status BaBar Cluster 76KSI2K ~10TB-50TB Had been unstable mainly because of failing disks Very few (<20 out of 120) healthy workers nodes left Many workers died during two shut downs ( no power to motherboards?) Very time consuming to maintain Recently purchased 4 twin Viglen quad core workers – two will go to the grid (2 Twin quad core nodes = 3 racks with 120 nodes! ) BaBar cluster withdrawn from the Grid as effort better spent getting new resources online

Birmingham Status – Atlas (grid) Farm Added 12 local workers to the grid 20 workers in total -> 40 job slots Will provides 60 jobs slots after local twin boxes are installed Upgraded to SL4 Installation with kickstart / Cfengine, maintained with Cfengine VOS: alice atlas babar biomed calice camont cms dteam fusion gridpp hone ilc lhcb ngs.ac.uk ops vo.southgrid.ac.uk zeus Several broken CPUs fans are being replaced

Birmingham Status - Grid Storage 1 DPM SL 3 head node with 10 TB attached to it Mainly dedicated to Atlas – no use by Alice but... Latest SL4 DPM provides xrootd needed by Alice Have just bought an extra 40 TB Upgrade strategy: current DPM head node will be migrated to new SL4 server, then a DPM pool node will be deployed on new DPM head node Performance issues with deleting files on ext3 fs were observed -> Should we move to XFS? SRMv2.2 with 3TB space token reservation for Atlas published Latest srmv2.2 clients (not in gLite yet) installed on BlueBear UI but not on PP desktops

Birmingham Status - eScience Cluster 31 nodes (servers included) with 2 Xeon CPU 3.06GHz and 2GB of RAM hosted by IS All on a private network but one NAT node Torque server on private network Connected to the grid via SL4 CE in Physics – more testing needed Serves as model for gLite deployment on BlueBear cluster -> installation assume no root access to workers and user tarball method Aimed to have it passing SAM test by GridPP20, but may not meet target as delayed by security challenge and helping with setting up Atlas on BlueBear Software area is not large enough to meet Atlas 100GB requirement :( ~150 cores will be allocated to Grid on BlueBear

Bristol Update Bristol is pleased to report that after considerable hard work, LCG on Bristol University HPC is running well, & the accounting is now showing the promised much higher-spec CPU usage. =UKI-SOUTHGRID-BRIS-HEPhttp://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_egee.php?ExecutingSite =UKI-SOUTHGRID-BRIS-HEP That purple credit goes to Jon Wakelin & Yves Coppens. Work will start soon on testing StoRM on SL4 in preparing to replace DPM access from HPC with StoRM. DPM will remain in use for the smaller cluster. 50TB of storage (gpfs) will be ready for PP at least by 1 Sept The above CE & SE are still on 'proof-of-concept' borrowed hardware. Purchases for new CE/SE/MON & NAT are pending, also we would like to replace older CE/SE/UI & WN (depends on funding).

Site status and future plans Oxford 510KSI2K 102TB –Two sub clusters (2004) GHz cpus running SL3 to be upgraded asap (2007) (Intel quads) running SL4 –New kit installed last Sept. in local computer room performing very well so far. –Need to move the 4 grid racks to the new Computer Room at Begbroke Science Park before end of March

Oxford Routine updates have brought us to the level required for CCRC08, and our storage had space tokens configured to allow us to take part in CCRC and FDR successfully. We have been maintaining two parallel services, one with SL4 workers, one with SL3 to support VOs that are still migrating. We've been working with Zeus and now have them running on the SL4 system, so the SL3 one is now due for imminent retirement. Overall it has been useful to maintain the two clusters rather than just moving to SL4 in one go. We've been delivering large amounts of work to LHC VOs. In periods where there hasn't been much LHC work available we've been delivering time to the fusion VO as part of our efforts to bring in resources from non PP sites such as JET. Oxford is one of the two sites supporting the vo.southgrid.ac.uk regional VO; so far only really testing work, but we have some potentially interested users who we're hoping to introduce to the grid. On a technical note Oxford's main CE (t2ce03.physics.ox.ac.uk) and site BDII (t2bdii01.physics.ox.ac.uk) are running on VMware server virtual machines. This is allowing good use of hardware, and a clean separation of services, and seems to be working very well.

EFDA JET Cluster upgraded at the end of November 2007, with 80 Sun Fire x2200 with Opteron 2218 CPUs. Worker nodes upgraded to SL4. Have provided a valuable contribution to ATLAS VO 242KSI2K 1.5TB

SouthGrid….Issues? How can SouthGrid become more pro-active with VOs (Atlas)? Alice is very specific with its VOBOX. CMS requires Phedex but RALPPD may be able to provide the interface for SouthGrid. Zeus and Fusion strongly supported NGS integration, Oxford has become an affiliate and Birmingham is passing conformance tests. SouthGrid regional VO will be used to bring local groups to the grid. Considering the importance of accounting, do we need independent cross-checks? Manpower issues supporting APEL? Bham PPS nodes are broken -> PPS service suspended :( What strategy should SouthGrid adopt (PPS needs to do 64 bit testing) ?

SouthGrid Summary Big improvements at –Oxford –Bristol –Jet Expansion expected shortly at –RAL PPD –Birmingham Working hard to solve problems with exploiting the resources at Cambridge Its sometimes an up hill struggle But the top is getting closer