GridPP Deployment Status Steve Traylen 28th October 2004 GOSC Face to Face, NESC, UK.

Slides:

Advertisements

Similar presentations

Slide 1 Steve Lloyd Grid Brokering Meeting - 4 Dec 2006 GridPP Steve Lloyd Queen Mary, University of London Grid Brokering Meeting December 2006.

Advertisements

S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.

Tony Doyle GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.

UK Testbed Report GridPP 9 Steve Traylen

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.

1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.

Tony Doyle Executive Summary, PPARC, MRC London, 15 May 2003.

Your university or experiment logo here What is it? What is it for? The Grid.

A Grid For Particle Physics From testbed to production Jeremy Coles 3 rd September 2004 All Hands Meeting – Nottingham, UK.

Partner Logo Tier1/A and Tier2 in GridPP2 John Gordon GridPP6 31 January 2003.

B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.

Stephen Burke - WP8 Status - 14/2/2002 Partner Logo WP8 Status Stephen Burke, PPARC/RAL.

Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.

ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.

S.L.LloydGridPP Collaboration Meeting IC Sept 2002Slide 1 Introduction Welcome to the 5 th GridPP Collaboration Meeting Steve Lloyd, Chair of GridPP.

GridPP Building a UK Computing Grid for Particle Physics A PPARC funded project.

Slide 1 of 24 Steve Lloyd NW Grid Seminar - 11 May 2006 GridPP and the Grid for Particle Physics Steve Lloyd Queen Mary, University of London NW Grid Seminar.

Slide 1 Steve Lloyd London Tier-2 Workshop - 16 Apr 2007 Introduction to Grids and GridPP Steve Lloyd Queen Mary, University of London London Tier-2 Workshop.

Particle physics – the computing challenge CERN Large Hadron Collider –2007 –the worlds most powerful particle accelerator –10 petabytes (10 million billion.

UK Agency for the support of: High Energy Physics - the nature of matter and mass Particle Astrophysics - laws from natural phenomena Astronomy - the.

Tony Doyle GridPP2 Proposal, BT Meeting, Imperial, 23 July 2003.

The National Grid Service and OGSA-DAI Mike Mineter

Dave Kant Grid Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPiX at Brookhaven 18 th – 22 nd Oct GOSC Oct 28.

The LHC experiments AuthZ Interoperation requirements GGF16, Athens 16 February 2006 David Kelsey CCLRC/RAL, UK

Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.

Steve Traylen Particle Physics Department Experiences of DCache at RAL UK HEP Sysman, 11/11/04 Steve Traylen

GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes.

Presenter Name Facility Name EDG Testbed Status Moving to Testbed Two.

LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.

Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.

Enabling e-Research over GridPP Dan Tovey University of Sheffield.

FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.

Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.

Tony Doyle “GridPP2 Proposal”, GridPP7 Collab. Meeting, Oxford, 1 July 2003.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Vendor Day 30 th April.

The Grid Prof Steve Lloyd Queen Mary, University of London.

CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.

3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.

Tony Doyle GridPP – From Prototype To Production, GridPP10 Meeting, CERN, 2 June 2004.

Dave Kant Grid Monitoring and Accounting Dave Kant CCLRC e-Science Centre, UK HEPiX at Brookhaven 18 th – 22 nd Oct 2004.

QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.

12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.

ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.

Neil Geddes GridPP-10, June 2004 UK e-Science Grid Dr Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Support Centre.

Steve Traylen Particle Physics Department EDG and LCG Status 9 th December 2003

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.

GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.

GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.

Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.

…building the next IT revolution From Web to Grid…

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

Deployment Summary GridPP11 Jeremy Coles 15th September 2004.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.

2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6

Andrew McNab - Manchester HEP - 17 September 2002 UK Testbed Deployment Aim of this talk is to the answer the questions: –“How much of the Testbed has.

Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.

Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.

J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.

Slide § David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP delivering The UK Grid.

18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.

The EDG Testbed Deployment Details

Moving the LHCb Monte Carlo production system to the GRID

Understanding the nature of matter -

Building a UK Computing Grid for Particle Physics

The LHCb Computing Data Challenge DC06

Presentation transcript:

GridPP Deployment Status Steve Traylen 28th October 2004 GOSC Face to Face, NESC, UK

Contents Middleware components of the GridPP Production System Status of the current operational Grid Future plans and challenges Summary GridPP 2 – From Prototype to Production

CMSLHCbATLASALICE 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB Annual production of one LHC experiment 1 Exabyte (1EB) = 1000 PB World annual information production The physics driver 40 million collisions per second After filtering, collisions of interest per second 1-10 Megabytes of data digitised for each collision = recording rate of Gigabytes/sec collisions recorded each year = ~10 Petabytes/year of data The LHC

The UK response GridPP GridPP – A UK Computing Grid for Particle Physics 19 UK Universities, CCLRC (RAL & Daresbury) and CERN Funded by the Particle Physics and Astronomy Research Council (PPARC) GridPP1 - Sept £17m "From Web to Grid GridPP2 – Sept £16(+1)m "From Prototype to Production"

Current context of GridPP

Our grid is working … NorthGrid **** Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid * Birmingham, Bristol, Cambridge, Oxford, RAL PPD, Warwick ScotGrid * Durham, Edinburgh, Glasgow LondonGrid *** Brunel, Imperial, QMUL, RHUL, UCL

… and is part of LCG Resources are being used for data challenges Within the UK we have some VO/experiment Memorandum of Understandings in place Tier-2 structure is working well

Scale GridPP prototype Grid > 1,000 CPUs –500 CPUs at the Tier-1 at RAL > 500 CPUs at 11 sites across UK organised in 4 Regional Tier-2s > 500 TB of storage > 800 simultaneous jobs Integrated with international LHC Computing Grid (LCG) > 5,000 CPUs > 4,000 TB of storage > 85 sites around the world > 4,000 simultaneous jobs monitored via Grid Operations Centre (RAL) CPUsFree CPUs Run Jobs Wait Jobs Avail TBUsed TBMax CPUAve. CPU Total (hyperthreading enabled on some sites)

Operational status (October)

VOs active

Who is directly involved? NumberPositionStatus General 1Production managerIn place and engaged 1Applications expertIdentified but not formally engaged 2Tier-1 /deployment expertIn place and fully engaged 4Tier-2 coordinatorsIn place and functioning well 0.5VO managementWill be part time but not yet in place 9.0Hardware supportPost allocated but not yet filled Specialist 1Data and storage managementExisting expert 1Work load managementExisting expert 1Security officerNot yet recruited 1NetworkingStarting in September

Past upgrade experience at RAL Previously utilisation of new resources grew steadily over weeks or months.

Tier-1 update th July 2004 Hardware Upgrade With the Grid we see a much more rapid utilisation of newly deployed resources.

The infrastructure developed in EDG/GridPP1 Job submission Python – default Java – GUI APIs (C++,J,P) Batch workers Storage Element Gatekeeper (PBS Scheduler) GridFTP Server NFS, Tape, Castor User Interface Computing Element Resource broker (C++ Condor MM libraries, Condor-G for submission) Replica catalogue per VO (or equiv.) Berkely Database Information Index AA server (VOMS) UI JDL Logging & Book keeping MySQL DB – stores job state info

Common Grid Components LCG uses middleware common to other Grid Projects. –VDT (v1.1.14) Globus Gatekeeper. Globus MDS. GlueCE Information Provider. Used by NGS, Grid3 and Nordugrid. Preserving this core increases chances of inter grid interoperability.

Extra Grid Components LCG extends VDT with fixes and the deployment of other grid services. This is only done when there is a shortfall or performance issue with the existing middleware. Most are grid wide services for LCG rather than extra components for sites to install. –Minimise conflicts between grids. –Not always true – see later.

LCG PBSJobManager Motivation –Standard Globus JobManager starts one perl process per job, queued or running. One user can destroy a Gatekeeper easily. –Also assumes a shared /home file system is present. Not scalable to 1000s of nodes. NFS a single failure point. –The Resource Broker must poll jobs indivdually.

LCG PBSJobManager Solution –LCG jobmanager stages files to batch worker with scp and GridFTP. Creates new problems though. –Even harder to debug and there is more to go wrong. –MPI jobs more difficult though an rsync work around exists.

LCG PBSJobManager Solution –JobManager starts up a GridMonitor on the gatekeeper. –One GridMonitor per Resource Broker is started currently. –Resource Broker communicates with the monitor instead of polling jobs individually. –Moving this to one GridMonitor per user is possible. Currently deployed at almost all GridPP sites.

Storage in LCG Currently there are three active solutions. –GridFTP servers, the so called ClassicSE –SRM interfaces at CERN, IHEP(Russia), DESY and RAL (this week). –edg-se – Only one as a front end the atlas data store tape system at RAL. The edg-rm and lcg-* commands abstract the end user from these interfaces.

Storage - SRM SRM = Storage Resource Manager. Motivation –Sites need to move files around and reorganise data dynamically. –The end user wants/requires a consistent name space for their files. –End users want to be able to reserve space this space as well. SRM will in time be the preferred solution supported within LCG.

SRM Deployment Current storage solution for LCG is dCache with an SRM interface. Produced by DESY and FNAL. This is currently deployed at RAL in a test state and is slipping into production initially for the CMS experiment. Expectation is that dCache with SRM will provide a solution for many sites. –Edinburgh, Manchester, Oxford all keen to deploy.

SRM/dCache at RAL

Resource Broker Allows selection of and submission to sites based on what they publish into the information system. Queues are published with –Queue lengths –Software available. –Authorised VOs or individual DNs. The RB can query the replica catalogue to run at a site with a particular file. Three RBs are deployed in the UK.

L&B L&B = Logging and Bookkeeping Service Jobs publish their Grid State to L&B. –Either by calling commands installed on batch worker. –Or by GridFTPing the job wrapper back. The second requires no software on batch workers but the first gives better feedback.

Application Installation with LCG Currently a sub VO of software managers owns an NFS mounted space. –Software area managed by jobs. –Software validated in process. –The drop a status file on to the file which is published by the site. With the RB –End users match jobs to tagged sites. –SW managers install SW at non tagged sites. This is being extended to allow DTEAM to install grid clients SW on WNs.

R-GMA Developed by GridPP within both EDG and now EGEE. Takes the role of a grid enabled SQL database. Example applications include CMS and D to publish their job bookkeeping. Can also be used to transport the Glue values and allows SQL lookups of Glue. R-GMA is deployed at most UK HEP sites. RAL currently runs the single instance of the R- GMA registry.

Next LCG Release LCG 2_3_0 is due now. –Built entirely on SL3 (RHE3 clone). RH73 still an option. –Many stability improvements. –Addition of accounting solution. –Easier addition of VOs. –Addition of DCache/SRM. and lots more… This release will last into next year. Potentially the last release before gLite components appear.

There are still challenges Middleware validation Meeting experiment requirements with the Grid Distributed file (and sub-file) management Experiment software distribution Production accounting Encouraging an open sharing of resources Security Smoothing deployment and service upgrades.

Middleware validation CERTIFICATION TESTING Integrate Basic Functionality Tests Run tests C&T suites Site suites Run Certification Matrix Release candidate tag APP INTEGR Certified release tag DEVELOPMENT & INTEGRATION UNIT & FUNCTIONAL TESTING Dev Tag JRA1 HEP EXPTS BIO-MED OTHER TBD APPS SW Installation DEPLOYMENT PREPARATION Deployment release tag DEPLOY SA1 SERVICES PRE-PRODUCTION PRODUCTION Production tag Is starting to be addressed through a Certification and Testing testbed… RAL is involved with both JRA1 and Pre Production systems.

ATLAS Data Challenge to validate world-wide computing model Packaging, distribution and installation: Scale: one release build takes 10 hours produces 2.5 GB of files Complexity: 500 packages, Mloc, 100s of developers and 1000s of users –ATLAS collaboration is widely distributed: 140 institutes, all wanting to use the software –needs push-button easy installation.. Physics Models Monte Carlo Truth Data MC Raw Data Reconstruction MC Event Summary Data MC Event Tags Detector Simulation Raw Data Reconstruction Data Acquisition Level 3 trigger Trigger Tags Event Summary Data ESD Event Summary Data ESD Event Tags Calibration Data Run Conditions Trigger System Step 1: Monte Carlo Data Challenges Step 1: Monte Carlo Data Challenges Step 2: Real Data Software distribution

Summary The Large Hadron Collider data volumes make Grid computing a necessity GridPP1 with EDG developed a successful Grid prototype GridPP members have played a critical role in most areas – security, work load management, information systems, monitoring & operations. GridPP involvement continues with the Enabling Grids for e-SciencE (EGEE) project – driving the federating of Grids As we move towards a full production service we face many challenges in areas such as deployment, accounting and true open sharing of resources

Useful links GRIDPP and LCG: GridPP collaboration Grid Operations Centre (inc. maps) The LHC Computing Grid Others PPARC The EGEE project The European Data Grid final review