GridPP Status Report David Britton, 15/Sep/09. 2 Introduction Since the last Oversight: The UK has continued to be a major contributor to wLCG A focus.

Slides:



Advertisements
Similar presentations
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Oversight Committee Meeting.
Advertisements

Storage Review David Britton,21/Nov/ /03/2014 One Year Ago Time Line Apr-09 Jan-09 Oct-08 Jul-08 Apr-08 Jan-08 Oct-07 OC Data? Oversight.
Project Status David Britton,15/Dec/ Outline Programmatic Review Outcome CCRC08 LHC Schedule Changes Service Resilience CASTOR Current Status Project.
GridPP4 Oversight Committee Meeting 4th February 2010
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Oversight Committee Meeting.
12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP26 Collaboration Meeting.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP27 15 th Sep 2011 GridPP.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Oversight Committee 25.
UKI-SouthGrid Overview Face-2-Face Meeting Pete Gronbech SouthGrid Technical Coordinator Oxford June 2013.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Vendor Day 30 th April.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
D. Britton GridPP Status - ProjectMap 22/Feb/06. D. Britton22/Feb/2006GridPP Status GridPP2 ProjectMap.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
GridPP23 – Final Steps to Data David Britton, 8/Sep/09.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Core operations Jeremy Coles GridPP28 17 th April 2012 Jeremy Coles GridPP28 17 th April 2012 a b.
Quarterly report ScotGrid Quarter Fraser Speirs.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
3 June 2004GridPP10Slide 1 GridPP Dissemination Sarah Pearce Dissemination Officer
GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP28 17 th Apr 2012 GridPP28.
GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Project Management Sarah Pearce 3 September GridPP21.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Computing for Particle.
11 March 2008 GridPP20 Collaboration meeting David Britton - University of Glasgow GridPP Status GridPP20 Collaboration Meeting, Dublin David Britton,
GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow PPAP Community Meeting Imperial,
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
GridPP3 project status Sarah Pearce 24 April 2010 GridPP25 Ambleside.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
JINR WLCG Tier 1 for CMS CICC comprises 2582 Core Disk storage capacity 1800 TB Availability and Reliability = 99% 49% 44% JINR (Dubna)End of.
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Slide § David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP delivering The UK Grid.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Update on Plan for KISTI-GSDC
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
UK Status and Plans Scientific Computing Forum 27th Oct 2017
LHC Data Analysis using a worldwide computing grid
GridPP4 Oversight Committee Meeting 4th February 2010
Presentation transcript:

GridPP Status Report David Britton, 15/Sep/09

2 Introduction Since the last Oversight: The UK has continued to be a major contributor to wLCG A focus on resilience and disaster management (GridPP22) The UK infrastructure has been validated by STEP09. Moved the Tier-1 to R89. Procured significant new hardware. Adapted to developments in the LHC schedule; the EGI+ proposals; and the UK funding constraints. Issues from the last Oversight: “Other Experiments.” EGI/NGI/NGS etc. CASTOR. OPN network. To be covered by Project Manager: Project Milestones/Deliverables. Project Risks. Project Finances.

3 WLCG: Largest scientific Grid in the world September 2009: >315,000 KSI2K Worldwide: 288 sites in 55 countries – 190,000 CPUs In the UKI: 22 sites and about 19,000 CPUs 15/Sep/09

4 UK CPU Contribution Same picture if non-LHC VOs included 15/Sep/09

5 8/Sep/09 UK Site Contributions 2007 – NorthGrid: 34% – 22% - 15% London: 28% – 25% - 32% ScotGrid: 18% – 17% - 22% Tier-1: 13% – 15% - 13% SouthGrid: 7% – 16% - 13% GridIreland: 0% – 6% - 5% All areas of the UK make valuable contributions “Other VOs” used 16% of the CPU time this year.

6 UK Site Contributions: Non LHC VOs 15/Sep/09 All regions supported the“Other VOs”. Top-12 “Other VOs” include many disciplines

7 Tier-2 Resources 1/Apr/09 The Tier-2s have delivered (Brunel currently installing 600TB of disk) Accounting error: 230TB delivered.

8 Tier-2 Performance Resource-weighted averages 8/Sep/09 The Tier-2s have improved and are performing well.

9 Service Resilience GridPP23 Agenda 15/Sep/09 A sustained push was made on improving service resilience at all levels. Many improvements were made at many sites and, ultimately, STEP09 demonstrated the the UK Grid was ready for data (see later slide). Disaster management processes were developed and are regularly engaged (see later slide).

10 STEP09 UK Highlights –RAL was the best ATLAS Tier-1 after the BNL ATLAS-only Tier-1 –Glasgow ran more jobs then any of the ATLAS Tier-2 sites throughout the world. –Tier-2 sites made good contributions and were tuning (not fire- fighting) during STEP09 and subsequent testing. –Quote: “The responsiveness of RAL to CMS during STEP09 was in stark-contrast to many other Tier-1s.” –CMS noted the tape performance at RAL was very good as was the CPU efficiency (CASTOR worked well). –Many (if not all) the metrics for the experiments were met, and in some cases, significantly exceeded at RAL during STEP09. 15/Sep/09

11 STEP09: RAL Operations Overview Generally very smooth operation: –Most service systems relatively unloaded plenty of spare capacity –Calm atmosphere. Daytime “production team” monitored service Only one callout, Most of the team even took two days out off site for department meeting! –Very good liaison with VOs and good idea what was going on. In regular informal contact with UK representatives –Some problems with CASTOR tape migration (3 days) on ATLAS instance but all handled satisfactorily and fixed. Did not visibly impact experiments. Robot broke down for several hours (stuck handbot led to all drives de- configured in CASTOR). Caught up quickly. Very useful exercise – learned a lot, but very reassuring –More at:

12 STEP09: RAL Batch Service Farm typically running > 2000 jobs. By 9 th June at equilibrium: (ATLAS 42%, CMS 18%, Alice 3%, LHCB 20%) Problem 1: ATLAS job submission exceeded 32K files on CE –See hole on 9 th. We thought ATLAS had paused  took time to spot. Problem 2: Fair shares not honoured as aggressive ALICE submission beat ATLAS to job starts. –Need more ATLAS jobs in queue faster. Manually cap ALICE. Fixed by 9 th June. See decrease in (red) ALICE work. Problem 3: Occupancy initially poor (initially 90%). Short on memory (2GB/core but ATLAS jobs needed 3GB vmem). Gradually increase MAUI over-commit on memory to 50%. Occupancy --> 98%.

13 Data Transfers 15/Sep/09 RAL achieved the highest average input and output data rates of any Tier-1.

14 OPN Resilience 15/Sep/09

15 In the end, hand-over to STFC was delay from Dec to Apr 09. Hardware was delayed but we were (almost) rescued by the LHC schedule change. Minor (?) issues remain with R89 (Aircon-trips; water-proof membrane?) (GridPP22) Current Issues: R89 15/Sep/09

16 Tier-1 Hardware The FY2008 hardware procurement had to await the acceptance of R89. The CPU is tested, accepted, and being deployed (14,000 HEPSPEC06 to add to current 19,000) The disk procurement (2 PB to add to existing 1.9PB) was split into two halves (different disks and controllers to mitigate against acceptance problems). This has proved sensible, as one batch has demonstrated ejection issues. One half of the disk is being deployed; progress is being made on the other half and best guess is deployment by end of November. A second SL85000 tape robot is available. The FY09 hardware procurement is underway. 15/Sep/09

17 Disaster Management A four-stage disaster management process was established at the Tier-1 earlier this year as part of our focus on resilience and disaster management. Designed to be used regularly so that process is familiar. This means low-threshold to trigger Stage-1 “disasters” At Stage-3, the process formally involves stake-holders outside the Tier-1, including GridPP management. This has now happened several times including: –R89 aircon trip –R89 water leak –Disk procurement problem –Swine flu planning. The process is still being honed, but I believe it is very useful. 15/Sep/09

18 - NGI EGI/NGI EGI UK-NGI Coordinating body in Amsterdam National initiatives in member countries GridPP NGS Involves STFC, EPSRC and JISC (at least) in the UK. EGI is vital to GridPP but it is not GridPP’s core business to run an e- science infrastructure for the whole of the UK: seek a middle ground. 15/Sep/09

19 EU Landscape SSC EMI EGI Heavy Users SSC SSC (Roscoe) Unicore ARC gLite UK involvement with Ganga? UK involvement via the UK NGI with global tasks such as GOGDB, security, dissemination, training.... UK involvement with APEL, GridSite? … UK involvement: FTS/LFC support post at RAL? 15/Sep/09

20 User Support Help pages. GridPP23 talks. User survey at RAL

21 Actions OPN – Detailed document provided. Cost is covered by existing GridPP hardware funds. Propose to proceed immediately to provision. Other Experiments – Usage shown on Slide-6. Allocation Policy is on the UserBoard web-pages: EGI/NGI/NGS – Paper provided. GridPP/UK has established potential links with all the structural units and is engaged in the developments. CASTOR – Paper provided. Paper provided. Version used during STEP09 worked well beyond the levels needed becoming an issue. 15/Sep/09

22 Current Issues Operational: Timing of CASTOR upgrade. Shake-down issues with R89. Problem with 50% of current disk purchase. High Level: Hardware planning – lack of clarity on approved global resources. Hardware pledges – financial constraints and the 2010 pledges. GridPP4 – lack of information on scope, process or timing against a backdrop of severe financial problems within STFC. 15/Sep/09

23 Key issue in the next six months To receive a sustained flow of data from CERN and to meet all the experiment expectations associated with custodial storage; data reprocessing; data distribution; and analysis. Requires: A resilient OPN network Stable operation of CASTOR storage Tier-1 hardware and services Tier-1 to Tier-2 networking Tier-2 hardware and services Help, support, deployment and operations. That is, the UK Particle Physics Grid. 15/Sep/09 The milestones necessary to meet these requirements have been met (with the possibly exception of the first) and the entire system validated with STEP09. We believe the UK is ready. We know that problems will arise and have focused on resilience to reduce the incidence of these, and on disaster management to handle those that do occur.

24 The End

25 Schedule It is foreseen that LHC will ready for beam by mid-November Before that All sectors powered separately to operating energy ++ Dry runs of many accelerator systems (from Spring) Injection, extraction, RF, collimators Controls Full machine checkout before taking beam Beam tests TI8 (June) TI2 (July) TI2 and TI8 interleaved (September) Injection tests (late October)

26 1/Apr/09