GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005.

Slides:



Advertisements
Similar presentations
Deployment metrics and planning (aka Potentially the most boring talk this week) GridPP16 Jeremy Coles 27 th June 2006.
Advertisements

Grid Deployment & Operations: EGEE, LCG and GridPP Jeremy Coles GridPP Production Manager UK&I Operations Manager for EGEE 20 th September.
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Quarterly report ScotGrid Quarter Fraser Speirs.
Communications Deployment parallel session Jeremy Coles 14th September 2004.
London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
Event Management & ITIL V3
Quarterly report ScotGrid Quarter Fraser Speirs.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.
Jeremy Coles UK LCG Operations The Geographical Distribution of GridPP Institutes Production Manager.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
Steve Traylen PPD Rutherford Lab Grid Operations PPD Christmas Lectures Steve Traylen RAL Tier1 Grid Deployment
CSCS Status Peter Kunszt Manager Swiss Grid Initiative CHIPP, 21 April, 2006.
GridPP Deployment Status GridPP15 Jeremy Coles 11 th January 2006.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Grid Security Vulnerability Group Linda Cornwall, GDB, CERN 7 th September 2005
LCG CCRC’08 Status WLCG Management Board November 27 th 2007
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
SC4 Planning Planning for the Initial LCG Service September 2005.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
GridPP storage status update Joint GridPP Board Deployment User Experiment Update Support Team, Imperial 12 July 2007,
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD20. June 2009 Helsinki R-COD in UKI Claire Devereux, Jeremy Coles & Co. COD-20,
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
ARDA Massimo Lamanna / CERN Massimo Lamanna 2 TOC ARDA Workshop Post-workshop activities Milestones (already shown in December)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
London Tier-2 Quarter Owen Maroney
LCG Service Challenge: Planning and Milestones
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Presentation transcript:

GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005

Overview 2 Trends in basic EGEE metrics 3 Utilisation and efficiency 4 Deployment priorities 5 Brief look at service challenges 6 Summary 1 The main changes over the last two months

Old vs new SFT See Piotr Nyczyk’s mail to LCG-ROLLOUT 21 st July

Old vs new SFT 1.Change in critical tests 2.Change in impact of test order 3.Tests are run more regularly 4.THINGS NOW LOOK MUCH MORE STABLE!

The new SFTs are used to populate regional weekly views

… and monthly views. The variations need to be understood (avg. 24hrs) Sites with large farms upgrading? Tier-1 scheduler lost

GridPP is still the largest contributor of resources

UK job slots have increased by >20% in last few months

Next to CERN additions this is one of the major recent increases

Contribution to EGEE CPU resources therefore remains good at ~20%

This has translated into GridPP taking an average of about 20% of the work recently

Which reflects the fact that our sites remain at least as stable as the EGEE average

A reminder of the “gstat metric” basis StatusDescriptionExample 0na or no status available 10ok or normal statusNo problems 20info or useful informationStorage over 90% full 30note or important informationGridIce tests are failing 40warn or subject mail fail soonBlank values or wrong format in configuration 50error or subject has failed and problem is localisedA query failed (e.g. no cpu information found) 60crit or subject has failed and problem is fatal maint or subject is under maintenanceScheduled downtime at site off or subject has monitoring offSite is undertaking work that would trigger alerts Gstat metric = ((#ok sites)*10+(#info sites)*20+(#note sites)*30+(#warn sites)*40+(#error sites)*50+(#crit sites)*60) / (#sites – (#maint+#off))

Occupancy averages at 55% for August (26% for period from June 04)

Several sites have been running full for July/August. The plot below is for the Tier-1 in August

August was the busiest month for the Tier-1 as evidenced by the total KSI2K delivered (KSI2K*CPUMonths)

There has been a Tier-1 investigation into job efficiency over the year (CPU time/Elapsed time Low efficiencies impact utilisation (in terms of CPU time provided) Produced by global performance problems on LCG SEs, coupled with problems in logging and book-keeping services Approximately 400 KSI2K*CPUmonths per month Feb-June – about 50% of total capacity Farm occupancy (job slots used) has increased >1 if job runs more than 1 CPU intensive process

Specific weighted job efficiencies for ATLAS in July Straight line structures show jobs which ran for a period of time before blocking on an external resource and eventually being killed by an elapsed time limit Clusters at low efficiency probably show performance problems on external storage elements Many problems seen here are NOW FIXED

We have seen a good general response to deployment

SRMs and data migration SRMs and data migration – dCache/DPM –We have most experience with dCache-SRM but gaining knowledge of DPM –The mailing list remains active – join and review the archives BEFORE attempting an installation so that we can support you better –There is now a GridPP wiki, which brings us on to … Links to all areas mentioned can be found on the deployment links page:

Our support model needs to be developed UKI ROC ticket tracking system (Footprints) Site A GGUS Regional service 1 Tier-1 helpdesk (Remedy) Grid-Ireland helpdesk (Remedy) GOSC (Footprints) CIC-on-duty Users Experiments/VOs Savannah – bug tracking Site administrators LCG-ROLLOUT TB-SUPPORT

Other areas (examples) Technical Implications of LCG Baseline Services Group findings Procurement and deployment of more resources while maintaining a steady service General PPARC signs the LCG MoU shortly – this commits all sites to a certain basic level of service (Tier-2s 72hrs response) The operations workshop at Culham (near RAL) later this month A training course for GridPP sysadmins to help prepare sites for SC4 and the increasing service demands (PPARC signs an LCG MoU soon!) A UK support workshop for users and sysadmins?

Service Challenge 3 enters a new phase Phase 1 (throughput tests) – July 2005 –dCache-SRM working at all sites –Tier-1 managed rates (on UKLIGHT) up to 650 Mb/s to CERN. This is similar to SC2 rates. –Edinburgh – 10TB data transferred. Sustained rates of Mb/s –Imperial – Rates reached Mb/s –Lancaster – 958GB (978 files) over 8 days (~27Mb/s sustained) Phase 2 (service phase) from 1 st September 2005 –The experiments will use the SC3 infrastructure for testing their models and production –Experiment (basic functionality) test jobs are being developed (to run as part of the SFTs) to check sites

Service Challenge 4 will affect all sites – start preparing! SC4 consists of a Setup Phase starting on 1st April 2006, during which a number of Throughput tests will be performed followed by a Service Phase from 1st May 2006 until the 30th September 2006 All service components for SC4 need to be delivered ready for production by the 31st January 2006 Final testing and integration of components and services must be completed by 31st March 2006 … more details in the panel discussion later today.

Summary 2 GridPP remains a major contributor to LCG/EGEE resources 3 Use of resources is increasing – there were concerns about efficiency 4 Sites did well with the upgrade during a vacation period 6 Service Challenge 3 enters the “Service Phase”. SC4 planning starts 1 We have seen changes in SFTs 5 Two major deployment tasks – support & SRM implementations