Tony Doyle GridPP – Making the Grid Work for the Science, ATSE e-Science Visit, Edinburgh, 20 April 2004.

Slides:



Advertisements
Similar presentations
CHEP 2000, Roberto Barbera Roberto Barbera (*) GENIUS: a Web Portal for the GRID Meeting Grid.it, Bologna, (*) work in collaboration.
Advertisements

S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
Tony Doyle GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004.
Andrew McNab - Manchester HEP - 24 May 2001 WorkGroup H: Software Support Both middleware and application support Installation tools and expertise Communication.
Tony Doyle GridPP2 Proposal, BT Meeting, Imperial, 23 July 2003.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Tony Doyle “GridPP2 Proposal”, GridPP7 Collab. Meeting, Oxford, 1 July 2003.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Quality Assurance and Testing in LCG CHEP 2004 Interlaken, Switzerland 30 September 2004 Manuel Gallas, Jakub MOSCICKI CERN
SPI Software Process & Infrastructure GRIDPP Collaboration Meeting - 3 June 2004 Jakub MOSCICKI
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
SPI Software Process & Infrastructure EGEE France - 11 June 2004 Yannick Patois
M Gallas CERN EP-SFT LCG-SPI: SW-Testing1 LCG-SPI: SW-Testing LCG Applications Area GridPP 7 th Collaboration Meeting LCG/SPI LCG.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Tony Doyle GridPP – From Prototype To Production, GridPP10 Meeting, CERN, 2 June 2004.
EGEE is a project funded by the European Union under contract IST The GENIUS portal Roberto Barbera University of Catania and INFN SEE-GRID.
A. Aimar - EP/SFT LCG - Software Process & Infrastructure1 Software Process panel SPI GRIDPP 7 th Collaboration Meeting 30 June – 2 July 2003 A.Aimar -
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Grid Workload Management Massimo Sgaravatto INFN Padova.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Training and the NGS Mike Mineter
EGEE is a project funded by the European Union under contract IST The GENIUS portal Roberto Barbera University of Catania and INFN First Latinamerican.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
GridPP Presentation to AstroGrid 13 December 2001 Steve Lloyd Queen Mary University of London.
LCG-SPI: SW-Testing LCG AppArea internal review (20/10/03)
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
…building the next IT revolution From Web to Grid…
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
Andrew McNabSecurity Middleware, GridPP8, 23 Sept 2003Slide 1 Security Middleware Andrew McNab High Energy Physics University of Manchester.
EGEE-0 / LCG-2 middleware Practical.
INFSO-RI Enabling Grids for E-sciencE GILDA and GENIUS Guy Warner NeSC Training Team An induction to EGEE for GOSC and the NGS NeSC,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Software Engineering Overview DTI International Technology Service-Global Watch Mission “Mission to CERN in Distributed IT Applications” June 2004.
WP3 Information and Monitoring Rob Byrom / WP3
A. Aimar - EP/SFT LCG - Software Process & Infrastructure1 SPI Software Process & Infrastructure for LCG Project Overview LCG Application Area Internal.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 The EU DataGrid Project Three years.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
SPI Software Process & Infrastructure Project Plan 2004 H1 LCG-PEB Meeting - 06 April 2004 Alberto AIMAR
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
EGEE is a project funded by the European Union under contract IST GENIUS and GILDA: a status report Roberto Barbera NA4 Generic Applications.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE is a project funded by the European Union under contract IST GENIUS and GILDA Guy Warner NeSC Training Team Induction to Grid Computing.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Bob Jones EGEE Technical Director
Regional Operations Centres Core infrastructure Centres
SPI Software Process & Infrastructure
UK GridPP Tier-1/A Centre at CLRC
Collaboration Board Meeting
Presentation transcript:

Tony Doyle GridPP – Making the Grid Work for the Science, ATSE e-Science Visit, Edinburgh, 20 April 2004

Tony Doyle - University of GlasgowContents Context 1.General (yesterday) 2.Process (today) 3.Operations (tomorrow) Start where Steve left off yesterday.. End up where Andrew begins tomorrow.. –How does the Grid Work? –Performance Indicators –Why was the “failure rate” ~20%? –Software Process –External dependencies –Managing a distributed project.. –Is GridPP a Grid? What is the Grid anyway? (from PP perspective) –Demo..

Tony Doyle - University of Glasgow How Does the Grid Work? 1. Authentication grid-proxy-init 2. Job submission e edg-job-submit 3. Monitoring and control e edg-job-status edg-job-cancel edg-job-get-output 4. Data publication and replication globus-url-copy, RLS 5. Resource scheduling – use of Mass Storage Systems JDL, sandboxes, storage elements 0. Web User Interface… or CLI

Tony Doyle - University of Glasgow Job Submission (behind the scenes) UI JDL Logging & Book-keeping ResourceBroker Job Submission ServiceStorageElementComputeElement InformationService Job Status ReplicaCatalogue DataSets info Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” Input “sandbox” + Broker Info Globus RSL Output “sandbox” Job Status Publish grid-proxy-init Expanded JDL SE & CE info

Tony Doyle - University of Glasgow How do I Authorize? o=testbed, dc=eu-datagrid, dc=org CN=Steven Hawking ou=People CN=Tony Doyle mkgridmap grid-mapfile VO Directory “Authorization Directory” CN=Homer Simpson o=xyz, dc=eu-datagrid, dc=org CN=Steven Hawking CN=Tony Doyle Authentication Certificate ou=Peopleou=Testbed1ou=??? local usersban list

Tony Doyle - University of Glasgow UK Certificate Authority and Virtual Organisation membership 1.UK e-Science Certificate Authority now used in application testbed 2.PP “users” engaged from many institutes 3.UK participating in 6 ex 9 EDG Virtual Organisations

Tony Doyle - University of Glasgow Performance indicators (as measured by end users) Conclusion: prototype performance, but with quality assurance mechanisms built-in

Tony Doyle - University of Glasgow Why was the “failure rate” ~20%? Component Testing e.g. RB Stress Tests (LCG) RB never crashed ran without problems at load for several days in a row 20 streams with 100 jobs each ( typical error rate ~ 2 % still present) RB stress test in a job storm of 50 streams, 20 jobs each : –50% of the streams ran out of connections between UI and RB. (configuration parameter – but machine constraints) –Remaining 50% streams finished normal (2% error rate) –Time between job-submit and return of the command (acceptance by the RB) is 3.5 seconds (independent of number of streams) PROBLEMS ARE END-TO-END: e.g. Site advertisement communicated via class ads to all sites (inc. e.g. CNAF) results in RB sending application jobs (e.g. AliEn for ALICE) to “black hole” – these are recorded as “failures” (application corrects for these via re-submission) OTHER “PROBLEM” IS INCORPORATION OF ADDED FUNCTIONALITY –~Resolved by adherence to software process coupled to testbed structure… improved significantly within LCG (leading to EGEE) III. Grid Middleware I. Experiment Layer II. Application Middleware IV. Facilities and Fabrics

Tony Doyle - University of Glasgow DataGrid Release Milestones EU Review (2.1.13) Evaluations (2.0.12) Features (2.1.13) [0.5Mloc] –Reasonable stability, reliability –VOMS incorporated –Bug fixes for all services. Features (2.0.12) –R-GMA replaced MDS –Refactored workload mgt. –Interactive, MPI, chkpt. jobs –Replica Location Service –Web Service SE Stabilisation time on application testbed typically a few months

Tony Doyle - University of Glasgow Software Process Infrastructure –Adopt the same set of tools, standards and procedures –Adopt commonly used open-source or commercial software when easily available –Avoid “do it yourself solutions” –Avoid commercial software, since it may give licensing problems Common services and infrastructure Tools, templates, training General QA, tests, integration, release Similar ways of working (process) LCG Application Area POOL, SEAL, PI, SIMU LCG grid software applications (LHC experiments, projects, etc) SPI Infrastructure

Tony Doyle - University of Glasgow SPI Services Overview Provide General Services needed by each project –CVS repository, Web Site, Software Library –Mailing Lists, Bug Reports, Task Management, Collaborative Facilities Provide solutions specific to the Software Development phases –Tools, Templates, Policies, Support, Documentations, Examples Coding Analysis and Design Development Release Specifications Testing Build systems Deployment and Installation Documentation Quality Assurance Software DevelopmentGeneral Services CVS service Collaborative FacilitiesTask ManagementMailing Lists Web Portal External Software

Tony Doyle - University of Glasgow External Software We install software needed by Particle Physics projects Open Source and Public Domain software (libraries and tools) like: –Compilers (icc, ecc) –HEP made packages –Scientific libraries (GSL) –General tools (python) –Test tools (cppunit, qmtest) –Database software (mysql, mysql++) –Documentation generators (lxr, doxygen) –XML parsers (XercesC) There are currently 50 different packages, plus others under evaluation. For more than 300 installations The LCG projects propose what to install in agreement with LHC needs The platforms are decided by the Architect Forum –Linux RedHat 7.3 with the compilers gcc 3.2 (rh73_gcc32) icc 7.1 (rh73_icc71) ecc 7.1 (rh73_ecc71) –Windows Visual Studio.NET 7.1: (win32_vc7).

Tony Doyle - University of Glasgow Tagged release selected for certification Certified release selected for deployment Tagged package Problem reports add unit tested code to repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build System Certification Testbed ~40CPU Application Testbed ~1000CPU Certified public release for use by apps. 24x7 Build system Test Group WPs Unit Test Build Certification Production Users Development Testbed ~15CPU Individual WP tests Integration Team Integration Overall release tests Releases candidate Tagged Releases Releases candidate Certified Releases Apps. Representatives How Is the process applied? Middleware Validation: From Testbed to Production Process to: Test frameworks Test support Test policies Test documentation Test platforms/compilers

Tony Doyle - University of Glasgow The UK Testbed

Tony Doyle - University of Glasgow e.g. ScotGrid: Glasgow, Edinburgh and Durham CE SE EDG 1.4 ScotGRID 59xWN Glasgow farm: WNs on a private network with outbound NAT in place 100,000 jobs completed (900,000 CPU hours) 34 dual blade servers and 5TB FastT500 being integrated now (next door) Shared resources (LHC, CDF and Bioinformatics)  EDG 2.1 Data Management Testbed CE SEMON Edinburgh: 24TB FastT700 and 8-way server: data storage focus Durham: 40 node farm All being integrated into LCG-2 CDF LHC BIO

Tony Doyle - University of Glasgow Managing a Distributed Project: GridPP1 Project Status?  76% of the 190 GridPP1 tasks have been successfully completed

Tony Doyle - University of Glasgow What is “The Grid” Is GridPP a Grid? Anyway? 1.Coordinates resources that are not subject to centralized control 2.… using standard, open, general-purpose protocols and interfaces 3.… to deliver nontrivial qualities of service 1.YES. This is why development and maintenance of a UK-EU-US testbed is important 2.YES... Globus/CondorG/EDG meet this requirement. Common experiment application layers are also important here. 3.NO(T YET)… Experiments define whether this is true - currently only ~100,000 jobs submitted via the testbed c.f. internal component tests of up 10,000 jobs per day. Next step: LCG-2 deployment outcome… this year

Tony Doyle - University of Glasgow What is The Grid Anyway? From Particle Physics Perspective The Grid is: not hype, but surrounded by it a working prototype running on testbed(s)… about seamless discovery of PC resources around the world using evolving standards for interoperation the basis for particle physics computing in the 21 st Century not (yet) as transparent as end-users want it to be

Tony Doyle - University of Glasgow The Grid: Demonstrations Demos used to establish that e.g. the two LHC multi-purpose detector collaborations can run jobs on an International Grid Use common Grid infrastructure with secure Grid access But doesn’t mean that the Grid works in production mode (yet) This is however s ig ni fi ca nt

Tony Doyle - University of Glasgow What is the GridPP1 Project Status?  76% of the 190 GridPP1 tasks have been successfully completed

Tony Doyle - University of Glasgow Achievements I 1.Dedicated people actively developing a Grid 2.All with personal certificates 3.Using the largest UK grid testbed (16 sites and hundreds of servers) 4.Deployed within EU-wide programme 5.Linked to Worldwide Grid testbeds

Tony Doyle - University of Glasgow Achievements II 6.Grid Deployment Programme Functioning The Basis for LHC Computing 7.Active Tier-1/A Production Centre meeting International Requirements 8.Latent Tier-2 resources being incorporated 9.Significant middleware development programme 10.All PP applications using the Grid testbed (open approach)

Tony Doyle - University of Glasgow The Challenges Ahead I, II, III: Scale, Complexity, UK Requirements Covered by Steve yesterday..

Tony Doyle - University of Glasgow The Challenges Ahead IV: Work Group Computing

Tony Doyle - University of Glasgow The Challenges Ahead V: Events.. to Files.. to Events RAW ESD AOD TAG “Interesting Events List” RAW ESD AOD TAG RAW ESD AOD TAG Tier-0(International) Tier-1(National) Tier-2(Regional) Tier-3(Local) Data Files Data Files Data Files TAG Data Files Data Files Data Files RAW Data File Data Files Data Files ESD Data Files Data Files AOD Data Event 1 Event 2 Event 3 VOMS-enhanced Grid certificates to access databases via metadata Non-Trivial..

Tony Doyle - University of Glasgow The Challenges Ahead VI: software distribution ATLAS Data Challenge (DC2) this year to validate world-wide computing model Packaging, distribution and installation: Scale: one release build takes 10 hours produces 2.5 GB of files Complexity: 500 packages, Mloc, 100s of developers and 1000s of users –ATLAS collaboration is widely distributed: 140 institutes, all wanting to use the software –needs ‘push-button’ easy installation.. Physics Models Monte Carlo Truth Data MC Raw Data Reconstruction MC Event Summary Data MC Event Tags Detector Simulation Raw Data Reconstruction Data Acquisition Level 3 trigger Trigger Tags Event Summary Data ESD Event Summary Data ESD Event Tags Calibration Data Run Conditions Trigger System Step 1: Monte Carlo Data Challenges Step 1: Monte Carlo Data Challenges Step 2: Real Data

Tony Doyle - University of Glasgow Complex workflow… LCG/ARDA Development 1.AliEn (ALICE Grid) provided a pre- Grid implementation [Perl scripts] 2.ARDA provides a framework for PP application middleware The Challenges Ahead VII: distributed analysis

Tony Doyle - University of Glasgow Next steps From prototype to production –UK particle physics grid equivalent to 20,000 1GHz personal computers by 2007 –available for day-to-day use by particle physicists –web portal for other e-scientists GridPP will support Enabling Grids for E-science in Europe (EGEE) [startup meeting today] –to integrate national and international grids, and grids from different scientific disciplines –particle physics is a pilot project 2007 – Large Hadron Collider goes live