Presentation is loading. Please wait.

Presentation is loading. Please wait.

GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005.

Similar presentations


Presentation on theme: "GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005."— Presentation transcript:

1 GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005

2 Overview 2 Trends in basic EGEE metrics 3 Utilisation and efficiency 4 Deployment priorities 5 Brief look at service challenges 6 Summary 1 The main changes over the last two months

3 Old vs new SFT http://goc.grid.sinica.edu.tw/gocwiki/Site_Functional_Tests See Piotr Nyczyk’s mail to LCG-ROLLOUT 21 st July

4 Old vs new SFT 1.Change in critical tests 2.Change in impact of test order 3.Tests are run more regularly 4.THINGS NOW LOOK MUCH MORE STABLE!

5 The new SFTs are used to populate regional weekly views

6 … and monthly views. The variations need to be understood (avg. 24hrs) Sites with large farms upgrading? Tier-1 scheduler lost

7 GridPP is still the largest contributor of resources

8 UK job slots have increased by >20% in last few months

9 Next to CERN additions this is one of the major recent increases

10 Contribution to EGEE CPU resources therefore remains good at ~20%

11 This has translated into GridPP taking an average of about 20% of the work recently

12 Which reflects the fact that our sites remain at least as stable as the EGEE average

13 A reminder of the “gstat metric” basis StatusDescriptionExample 0na or no status available 10ok or normal statusNo problems 20info or useful informationStorage over 90% full 30note or important informationGridIce tests are failing 40warn or subject mail fail soonBlank values or wrong format in configuration 50error or subject has failed and problem is localisedA query failed (e.g. no cpu information found) 60crit or subject has failed and problem is fatal maint or subject is under maintenanceScheduled downtime at site off or subject has monitoring offSite is undertaking work that would trigger alerts Gstat metric = ((#ok sites)*10+(#info sites)*20+(#note sites)*30+(#warn sites)*40+(#error sites)*50+(#crit sites)*60) / (#sites – (#maint+#off))

14 Occupancy averages at 55% for August (26% for period from June 04)

15 Several sites have been running full for July/August. The plot below is for the Tier-1 in August

16 August was the busiest month for the Tier-1 as evidenced by the total KSI2K delivered (KSI2K*CPUMonths)

17 There has been a Tier-1 investigation into job efficiency over the year (CPU time/Elapsed time Low efficiencies impact utilisation (in terms of CPU time provided) Produced by global performance problems on LCG SEs, coupled with problems in logging and book-keeping services Approximately 400 KSI2K*CPUmonths per month Feb-June – about 50% of total capacity Farm occupancy (job slots used) has increased >1 if job runs more than 1 CPU intensive process

18 Specific weighted job efficiencies for ATLAS in July Straight line structures show jobs which ran for a period of time before blocking on an external resource and eventually being killed by an elapsed time limit Clusters at low efficiency probably show performance problems on external storage elements Many problems seen here are NOW FIXED

19 We have seen a good general response to 2.6.0 deployment

20 SRMs and data migration SRMs and data migration – dCache/DPM –We have most experience with dCache-SRM but gaining knowledge of DPM –The mailing list remains active – join and review the archives BEFORE attempting an installation so that we can support you better –There is now a GridPP wiki, which brings us on to … Links to all areas mentioned can be found on the deployment links page: http://www.gridpp.ac.uk/deployment/links.html

21 Our support model needs to be developed UKI ROC ticket tracking system (Footprints) Site A GGUS Regional service 1 Tier-1 helpdesk (Remedy) Grid-Ireland helpdesk (Remedy) GOSC (Footprints) CIC-on-duty Users Experiments/VOs Savannah – bug tracking Site administrators LCG-ROLLOUT TB-SUPPORT

22 Other areas (examples) Technical Implications of LCG Baseline Services Group findings Procurement and deployment of more resources while maintaining a steady service General PPARC signs the LCG MoU shortly – this commits all sites to a certain basic level of service (Tier-2s 72hrs response) The operations workshop at Culham (near RAL) later this month http://egee.in2p3.fr/events/UKI/ A training course for GridPP sysadmins to help prepare sites for SC4 and the increasing service demands (PPARC signs an LCG MoU soon!) A UK support workshop for users and sysadmins?

23 Service Challenge 3 enters a new phase Phase 1 (throughput tests) – July 2005 –dCache-SRM working at all sites –Tier-1 managed rates (on UKLIGHT) up to 650 Mb/s to CERN. This is similar to SC2 rates. –Edinburgh – 10TB data transferred. Sustained rates of 220- 250Mb/s –Imperial – Rates reached 400-480 Mb/s –Lancaster – 958GB (978 files) over 8 days (~27Mb/s sustained) Phase 2 (service phase) from 1 st September 2005 –The experiments will use the SC3 infrastructure for testing their models and production –Experiment (basic functionality) test jobs are being developed (to run as part of the SFTs) to check sites

24 Service Challenge 4 will affect all sites – start preparing! SC4 consists of a Setup Phase starting on 1st April 2006, during which a number of Throughput tests will be performed followed by a Service Phase from 1st May 2006 until the 30th September 2006 All service components for SC4 need to be delivered ready for production by the 31st January 2006 Final testing and integration of components and services must be completed by 31st March 2006 … more details in the panel discussion later today.

25 Summary 2 GridPP remains a major contributor to LCG/EGEE resources 3 Use of resources is increasing – there were concerns about efficiency 4 Sites did well with the upgrade during a vacation period 6 Service Challenge 3 enters the “Service Phase”. SC4 planning starts 1 We have seen changes in SFTs 5 Two major deployment tasks – support & SRM implementations


Download ppt "GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005."

Similar presentations


Ads by Google