UK Status for SC3 Jeremy Coles GridPP Production Manager: Service Challenge Meeting Taipei 26 th April 2005.

Slides:



Advertisements
Similar presentations
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Advertisements

Service Challenge Meeting “Service Challenge 2 update” James Casey, IT-GD, CERN IN2P3, Lyon, 15 March 2005.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
NorthGrid status Alessandra Forti Gridpp13 Durham, 4 July 2005.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
Applications Area Issues RWL Jones Deployment Team – 2 nd June 2005.
London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Jeremy Coles - RAL 17th May 2005Service Challenge Meeting GridPP Structures and Status Report Jeremy Coles
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
RAL Tier 1 Site Report HEPSysMan – RAL – May 2006 Martin Bly.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
CMS Report – GridPP Collaboration Meeting V Peter Hobson, Brunel University16/9/2002 CMS Status and Plans Progress towards GridPP milestones Workload management.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
RAL DCache Status Derek Ross/Steve Traylen April 2005.
NL Service Challenge Plans Kors Bos, Sander Klous, Davide Salomoni (NIKHEF) Pieter de Boer, Mark van de Sanden, Huub Stoffers, Ron Trompert, Jules Wolfrat.
Testing the UK Tier 2 Data Storage and Transfer Infrastructure C. Brew (RAL) Y. Coppens (Birmingham), G. Cowen (Edinburgh) & J. Ferguson (Glasgow) 9-13.
CSCS Status Peter Kunszt Manager Swiss Grid Initiative CHIPP, 21 April, 2006.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
LCG Storage Workshop “Service Challenge 2 Review” James Casey, IT-GD, CERN CERN, 5th April 2005.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Service Challenge Meeting “Review of Service Challenge 1” James Casey, IT-GD, CERN RAL, 26 January 2005.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
17 September 2004David Foster CERN IT-CS 1 Network Planning September 2004 David Foster Networks and Communications Systems Group Leader
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
WLCG IPv6 deployment strategy
London Tier-2 Quarter Owen Maroney
“A Data Movement Service for the LHC”
LCG Service Challenge: Planning and Milestones
Service Challenge 3 CERN
Presentation transcript:

UK Status for SC3 Jeremy Coles GridPP Production Manager: Service Challenge Meeting Taipei 26 th April 2005

UK status for SC3 – Taipei, 26 th April 2005 GridPP sites ScotGrid Tier-2s NorthGrid SouthGrid London Tier-2 SC3 sites LHCb ATLAS CMS Tier-1

UK status for SC3 – Taipei, 26 th April 2005 General preparation Mailing list established Reviewed dCache usage and issues at a recent GridPP storage workshop: SC3 tutorial at Imperial college planned for early May (13 th ?) Small semi-related work on putting SE gridftp logs and dCache billing information into R-GMA. Some preliminary work is here:- Tier-2s generally well engaged with experiments at respective sites SRM status reviewed at biweekly update calls DPM being evaluated by GridPP Storage Group who already support dCache deployment at Tier-2s SC3 weekly call to start in mid-May

Tier-1A Andrew Sansum

UK status for SC3 – Taipei, 26 th April 2005 SRM Status Already run a production dCache SRM (CMS, ATLAS – LHCB soon). Deployed second SRM for SC2 but only tested gridftp (Radiant did not support SRM) During SC3, intend to use production SRM with existing hardware - phase out special SC SRM. –Hope to dual attach, but need fix from DESY to support multiple interfaces on gridftp service. FTP protocol returned i.p. for data channel on wrong interface. (Can we bind the gridFTP doors?). Pool nodes tried to contact external interfaces of GridFTP Nodes. –Developing a “plan B” to inject SC UKLIGHT traffic into production network. Tape service tested during SC2, but stability problems on Radiant hampered measurements. Tape SRM working fine, but bandwidth to tape still too low (expected). New disk cache for tape service ordered, expected in production by early June. Expect to meet target rate. During recent tests the ADS was accepting data over the network at about 100Mbytes/sec. There was a mix of disks on the servers, but over the 4 servers they were doing about 300MBytes/sec (split between reads ands writes). The tape drives were going at between 5 and 20MB/sec depending on which disk was in use.

UK status for SC3 – Taipei, 26 th April 2005 Network/Other Existing UKLIGHT T0-T1 network (2*1Gbit) expected to remain for SC3. Some tuning problems during SC2. Will complete a “health check” before end May. Will need to introduce light-weight “firewall” during May. May re-engineer RAL network to avoid dual attachment problems Have little idea what middleware requirements there are or what experiments will do. Awaiting information from SC planning (this meeting?). T1 managed 80MB/s on 2 *1Gb/s links. We continue to suspect problem is down to the link. Likely to be delivering from 5-8 servers at CERN to 8-16 end points at RAL. At the RAL end, selection of the endpoint hosts will be based on local CPU load (not load balance on the aggregate). Current planning assumptions: Throughput challenge starts 1st July Month. Target 150MB/s (real data rate) T0->T1. Network load is higher than this - say 1.5Gb/s+headroom. T1-T2 rates - need to move/upload data via RAL-Lancaster, RAL-IC and possibly RAL- Edinburgh. Need more planning data on T1-T2 rates. It is understood that LCG would accept any rate. For what scale (of MC production) are the experiments aiming? Tier-2s will likely connect with T1 via existing production network over SJ4

Edinburgh (LHCb) Phil Clark

UK status for SC3 – Taipei, 26 th April 2005 Edinburgh (ScotGrid) HARDWARE GridPP frontend machines deployed and 1 TB storage currently attached to a classic SE. Prototype dCache SRM installed and tested. Waiting for staff replacement to continue work this week. Limited CPU available so waiting for more details about requirements from LHCb Disk server: IBM xSeries 440 with eight 1.9 GHz Xeon processors, 32Gb RAM GB copper ethernet to GB 3com switch. PROBLEMS/ISSUES 1) LHCb hardware requirements not yet known 2) LHCb software installations required for SC3 not clear 3) Unlikely to be connected to UKLIGHT for SC3

UK status for SC3 – Taipei, 26 th April 2005 Networking Current configuration SE-> GBit ethernet (Copper) -> ScotGrid Switch -> GB ethernet -> SRIF switch -> 2x10GB/s -> 2nd SRIF switch -> _dedicated_ East Man connection to Janet. Expected bottleneck is the IO. Unlikely to use UKLIGHT Worker nodes ACF router ScotGrid switch Kings buildings GbE

Imperial College (CMS) Dave Colling

UK status for SC3 – Taipei, 26 th April 2005 Imperial College Setup London MAN (LMN) is connected to Janet backbone. LMN connectivity is at 2.5Gb/s. There are 3 Points of Presence (PoPs) on the LMN, of which Imperial is one. Each PoP is connected at 1Gb/s. The traffic from the Imperial campus goes through 2 firewalls (with the campus divided up geographically so that traffic is split between them). These have never really been stressed but are unlikely to give a sustained throughput greater than ~400Mb/s IC has Gb/s fibre and switches from the firewall to machine room. The disk servers all have Gb/s connections (mixture of fibre and copper) Network information is available here:

UK status for SC3 – Taipei, 26 th April 2005 Imperial College SOFTWARE Site upgrading to the latest LCG release dCache deployment starts next week Good UK support for experiment software installations required – but not much information yet on what this will be! PROBLEMS/ISSUES Unlikely that IC ready to use UKLIGHT so production network expected to be used Production network likely to be limited by firewall CMS requirements unclear – how much disk space should be dedicated to this challenge? Any additional equipment will need to be ordered this month! Site asking about any assurances of security of dCache code – they want to view the source code!

Lancaster (ATLAS) Brian Davies

UK status for SC3 – Taipei, 26 th April 2005 Current hardware status

UK status for SC3 – Taipei, 26 th April 2005 Planned site setup 1.LighPath and terminal Endbox installed. 2.Still require some hardware for our internal network topology. 3.Increase in Storage to ~84TB to possible ~92TB with working resilient dCache from CE

UK status for SC3 – Taipei, 26 th April 2005 Network status 1.Currently data storage element connected to university backbone via 100Mb/s link 2.No Plan for transfers via university network but planning with assumption of production network as a possible backup 3.Plan traffic for UKLight connection 4.T1=RAL; T2=UK NorthGrid; T2 site=Lancaster

UK status for SC3 – Taipei, 26 th April 2005 Current software status LCG 2_4_0 deployed. dCache and SRM testbed on two nodes –Will deploy onto production SE when SE is fully installed. –Need to optimise dCache installation onto SE dCache. Currently evaluating what software and services needed overall and their deployment for site at T0/T1/ but in particular T2. T2 Designing system so that no changes needed for transition from throughput phase to service phase. –Except possible IPerf and/or transfers of “fake data” as backup of connection and bandwidth tests in throughput phase T1 transitions may require T2 changes. Problems mainly consist of hardware software procurement/implementation!! –dCache deployment –Experiment software/services/interfaces

UK status for SC3 – Taipei, 26 th April 2005 Lancs/ATLAS SC3 Plan TaskStart DateEnd DateResource Demonstrate sustained data transfer (T0-T1-T2) Integrate Don Quixote tools into SC infrastructureMon 27/06/05Fri 16/09/05BD,ATLAS Provision of end-to-end conn (T0-T1) Test basic data movement (CERN-UK)Mon 04/07/05Fri 29/07/05BD,MD,ATLAS,RAL Review of bottlenecks and required actionsMon 01/08/05Fri 16/09/05BD, ATLAS, RAL ATLAS Service Challenge 3 (Service Phase)Mon 19/09/05Fri 18/11/05BD, ATLAS, RAL Review of bottlenecks and required actionsMon 21/11/05Fri 27/01/06BD, ATLAS, RAL Optimisation of networkMon 30/01/06Fri 31/03/06BD,MD,BG,NP, RAL Test of data transfer rates (CERN-UK)Mon 03/04/06Fri 28/04/06BD,MD, ATLAS, RAL Review of bottlenecks and required actionsMon 01/05/06Fri 26/05/06BD, ATLAS, RAL Optimisation of networkMon 29/05/06Fri 30/06/06BD,MD,BG,NP, RAL Test of data transfer rates (CERN-UK)Mon 03/07/06Fri 28/07/06BD,MD, ATLAS, RAL Review of bottlenecks and required actionsMon 31/07/06Fri 15/09/06BD, ATLAS, RAL Plan organised through end of SC3 into SC4

UK status for SC3 – Taipei, 26 th April 2005 Lancs/ATLAS SC3 Plan TaskStart DateEnd DateResource Optimisation of networkMon 18/09/06Fri 13/10/06BD,MD,BG,NP, RAL Test of data transfer rates (CERN-UK)Mon 16/10/06Fri 01/12/06BD,MD, ATLAS, RAL Provision of end-to-end conn. (T1-T2) Integrate Don Quixote tools into SC infrastructure at LANMon 19/09/05Fri 30/09/05BD Provision of memory-to-memory conn. (RAL-LAN)Tue 29/03/05Fri 13/05/05UKERNA,BD,BG,NP,RA L Provision and Commission of LAN h/wTue 29/03/05Fri 10/06/05BD,BG,NP Installation of LAN dCache SRMMon 13/06/05Fri 01/07/05MD,BD Test basic data movement (RAL-LAN)Mon 04/07/05Fri 29/07/05BD,MD,ATLAS, RAL Review of bottlenecks and required actionsMon 01/08/05Fri 16/09/05BD [SC3 – Service Phase] Review of bottlenecks and required actionsMon 21/11/05Fri 27/01/06BD Optimisation of networkMon 30/01/06Fri 31/03/06BD,MD,BG,NP Test of data transfer rates (RAL-LAN)Mon 03/04/06Fri 28/04/06BD,MD, Review of bottlenecks and required actionsMon 01/05/06Fri 26/05/06BD

UK status for SC3 – Taipei, 26 th April 2005 Lancs/ATLAS SC3 Plan TaskStart DateEnd DateResource Optimisation of networkMon 29/05/06Fri 30/06/06BD,MD,BG,NP Test of data transfer rates (RAL-LAN)Mon 03/07/06Fri 28/07/06BD,MD Review of bottlenecks and required actionsMon 31/07/06Fri 15/09/06BD Optimisation of networkMon 18/09/06Fri 13/10/06BD,MD,BG,NP Test of data transfer rates (RAL-LAN)Mon 16/10/06Fri 01/12/06BD,MD

UK status for SC3 – Taipei, 26 th April 2005 High-level timeline – for discussion TaskStart DateEnd Date Review dCache issues from SC2 and T2 test installations13/04/0514/04/05 Basic infrastructure deployment15/04/0527/05/05 Reconfirm experiment hardware and software requirements25/04/0529/04/05 Procure any additional hardware required2/05/0527/05/05 Training for additional site services13/05/05 Deploy and test dCache/SRM at T2s for SC2/05/0518/05/05 Deploy and test experiment specific software for Phase 2?ASAP??? Tune and test T1-T0 with SC hardware 09/05/0510/06/05 Network tests T1-T2s (Mem-Mem, Disk-Disk) 16/05/0518/06/05 Review of bottlenecks and other issues 18/06/0529/06/05 SC3 troughput phase 01/07/0531/08/05 SC3 service phase 01/09/05…

UK status for SC3 – Taipei, 26 th April 2005 Summary All UK participating sites have started work for SC3 Lack of clear requirements is causing some problems in planning and thus deployment – what are the critical success factors for SC3? What service level is expected of Tier-2s? “We don’t know what we want but we’ll know when we have it!” Lower T1-T2 data rates than expected is pushing GridPP towards using current production network across SJ4 (UKLight may be available for Lancaster) dCache deployment is progressing at a moderate pace. Still need to resolve T1 dual-node problem Some areas still unclear such as what extra services will be deployed at the participating sites. (Nb. next LCG release is on 1 st July). This is becoming clearer but a list (with responsible group) would help Several open questions about how network monitoring will be performed during SC3 – what are the requirements on sites?