Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tony Doyle GridPP – Making the Grid Work for the Science, ATSE e-Science Visit, Edinburgh, 20 April 2004.

Similar presentations


Presentation on theme: "Tony Doyle GridPP – Making the Grid Work for the Science, ATSE e-Science Visit, Edinburgh, 20 April 2004."— Presentation transcript:

1 Tony Doyle a.doyle@physics.gla.ac.uk GridPP – Making the Grid Work for the Science, ATSE e-Science Visit, Edinburgh, 20 April 2004

2 Tony Doyle - University of GlasgowContents Context 1.General (yesterday) 2.Process (today) 3.Operations (tomorrow) Start where Steve left off yesterday.. End up where Andrew begins tomorrow.. –How does the Grid Work? –Performance Indicators –Why was the “failure rate” ~20%? –Software Process –External dependencies –Managing a distributed project.. –Is GridPP a Grid? What is the Grid anyway? (from PP perspective) –Demo..

3 Tony Doyle - University of Glasgow How Does the Grid Work? 1. Authentication grid-proxy-init 2. Job submission e edg-job-submit 3. Monitoring and control e edg-job-status edg-job-cancel edg-job-get-output 4. Data publication and replication globus-url-copy, RLS 5. Resource scheduling – use of Mass Storage Systems JDL, sandboxes, storage elements 0. Web User Interface… or CLI

4 Tony Doyle - University of Glasgow Job Submission (behind the scenes) UI JDL Logging & Book-keeping ResourceBroker Job Submission ServiceStorageElementComputeElement InformationService Job Status ReplicaCatalogue DataSets info Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” Input “sandbox” + Broker Info Globus RSL Output “sandbox” Job Status Publish grid-proxy-init Expanded JDL SE & CE info

5 Tony Doyle - University of Glasgow How do I Authorize? o=testbed, dc=eu-datagrid, dc=org CN=Steven Hawking ou=People CN=Tony Doyle mkgridmap grid-mapfile VO Directory “Authorization Directory” CN=Homer Simpson o=xyz, dc=eu-datagrid, dc=org CN=Steven Hawking CN=Tony Doyle Authentication Certificate ou=Peopleou=Testbed1ou=??? local usersban list

6 Tony Doyle - University of Glasgow UK Certificate Authority and Virtual Organisation membership 1.UK e-Science Certificate Authority now used in application testbed 2.PP “users” engaged from many institutes 3.UK participating in 6 ex 9 EDG Virtual Organisations 1. 2. 3.

7 Tony Doyle - University of Glasgow Performance indicators (as measured by end users) Conclusion: prototype performance, but with quality assurance mechanisms built-in

8 Tony Doyle - University of Glasgow Why was the “failure rate” ~20%? Component Testing e.g. RB Stress Tests (LCG) RB never crashed ran without problems at load for several days in a row 20 streams with 100 jobs each ( typical error rate ~ 2 % still present) RB stress test in a job storm of 50 streams, 20 jobs each : –50% of the streams ran out of connections between UI and RB. (configuration parameter – but machine constraints) –Remaining 50% streams finished normal (2% error rate) –Time between job-submit and return of the command (acceptance by the RB) is 3.5 seconds (independent of number of streams) PROBLEMS ARE END-TO-END: e.g. Site advertisement communicated via class ads to all sites (inc. e.g. CNAF) results in RB sending application jobs (e.g. AliEn for ALICE) to “black hole” – these are recorded as “failures” (application corrects for these via re-submission) OTHER “PROBLEM” IS INCORPORATION OF ADDED FUNCTIONALITY –~Resolved by adherence to software process coupled to testbed structure… improved significantly within LCG (leading to EGEE) III. Grid Middleware I. Experiment Layer II. Application Middleware IV. Facilities and Fabrics

9 Tony Doyle - University of Glasgow DataGrid Release Milestones EU Review (2.1.13) Evaluations (2.0.12) Features (2.1.13) [0.5Mloc] –Reasonable stability, reliability –VOMS incorporated –Bug fixes for all services. Features (2.0.12) –R-GMA replaced MDS –Refactored workload mgt. –Interactive, MPI, chkpt. jobs –Replica Location Service –Web Service SE Stabilisation time on application testbed typically a few months

10 Tony Doyle - University of Glasgow Software Process Infrastructure –Adopt the same set of tools, standards and procedures –Adopt commonly used open-source or commercial software when easily available –Avoid “do it yourself solutions” –Avoid commercial software, since it may give licensing problems Common services and infrastructure Tools, templates, training General QA, tests, integration, release Similar ways of working (process) LCG Application Area POOL, SEAL, PI, SIMU LCG grid software applications (LHC experiments, projects, etc) SPI Infrastructure

11 Tony Doyle - University of Glasgow SPI Services Overview Provide General Services needed by each project –CVS repository, Web Site, Software Library –Mailing Lists, Bug Reports, Task Management, Collaborative Facilities Provide solutions specific to the Software Development phases –Tools, Templates, Policies, Support, Documentations, Examples Coding Analysis and Design Development Release Specifications Testing Build systems Deployment and Installation Documentation Quality Assurance Software DevelopmentGeneral Services CVS service Collaborative FacilitiesTask ManagementMailing Lists Web Portal External Software

12 Tony Doyle - University of Glasgow External Software We install software needed by Particle Physics projects Open Source and Public Domain software (libraries and tools) like: –Compilers (icc, ecc) –HEP made packages –Scientific libraries (GSL) –General tools (python) –Test tools (cppunit, qmtest) –Database software (mysql, mysql++) –Documentation generators (lxr, doxygen) –XML parsers (XercesC) There are currently 50 different packages, plus others under evaluation. For more than 300 installations The LCG projects propose what to install in agreement with LHC needs The platforms are decided by the Architect Forum –Linux RedHat 7.3 with the compilers gcc 3.2 (rh73_gcc32) icc 7.1 (rh73_icc71) ecc 7.1 (rh73_ecc71) –Windows Visual Studio.NET 7.1: (win32_vc7).

13 Tony Doyle - University of Glasgow Tagged release selected for certification Certified release selected for deployment Tagged package Problem reports add unit tested code to repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build System Certification Testbed ~40CPU Application Testbed ~1000CPU Certified public release for use by apps. 24x7 Build system Test Group WPs Unit Test Build Certification Production Users Development Testbed ~15CPU Individual WP tests Integration Team Integration Overall release tests Releases candidate Tagged Releases Releases candidate Certified Releases Apps. Representatives How Is the process applied? Middleware Validation: From Testbed to Production Process to: Test frameworks Test support Test policies Test documentation Test platforms/compilers

14 Tony Doyle - University of Glasgow The UK Testbed

15 Tony Doyle - University of Glasgow e.g. ScotGrid: Glasgow, Edinburgh and Durham CE SE EDG 1.4 ScotGRID 59xWN Glasgow farm: WNs on a private network with outbound NAT in place 100,000 jobs completed (900,000 CPU hours) 34 dual blade servers and 5TB FastT500 being integrated now (next door) Shared resources (LHC, CDF and Bioinformatics)  EDG 2.1 Data Management Testbed CE SEMON Edinburgh: 24TB FastT700 and 8-way server: data storage focus Durham: 40 node farm All being integrated into LCG-2 CDF LHC BIO

16 Tony Doyle - University of Glasgow Managing a Distributed Project: GridPP1 Project Status?  76% of the 190 GridPP1 tasks have been successfully completed

17 Tony Doyle - University of Glasgow What is “The Grid” Is GridPP a Grid? Anyway? 1.Coordinates resources that are not subject to centralized control 2.… using standard, open, general-purpose protocols and interfaces 3.… to deliver nontrivial qualities of service 1.YES. This is why development and maintenance of a UK-EU-US testbed is important 2.YES... Globus/CondorG/EDG meet this requirement. Common experiment application layers are also important here. 3.NO(T YET)… Experiments define whether this is true - currently only ~100,000 jobs submitted via the testbed c.f. internal component tests of up 10,000 jobs per day. Next step: LCG-2 deployment outcome… this year http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf

18 Tony Doyle - University of Glasgow What is The Grid Anyway? From Particle Physics Perspective The Grid is: not hype, but surrounded by it a working prototype running on testbed(s)… about seamless discovery of PC resources around the world using evolving standards for interoperation the basis for particle physics computing in the 21 st Century not (yet) as transparent as end-users want it to be

19 Tony Doyle - University of Glasgow The Grid: Demonstrations http://www.gridpp.ac.uk/demos/ Demos used to establish that e.g. the two LHC multi-purpose detector collaborations can run jobs on an International Grid Use common Grid infrastructure with secure Grid access But doesn’t mean that the Grid works in production mode (yet) This is however s ig ni fi ca nt

20 Tony Doyle - University of Glasgow What is the GridPP1 Project Status?  76% of the 190 GridPP1 tasks have been successfully completed

21 Tony Doyle - University of Glasgow Achievements I 1.Dedicated people actively developing a Grid 2.All with personal certificates 3.Using the largest UK grid testbed (16 sites and hundreds of servers) 4.Deployed within EU-wide programme 5.Linked to Worldwide Grid testbeds

22 Tony Doyle - University of Glasgow Achievements II 6.Grid Deployment Programme Functioning The Basis for LHC Computing 7.Active Tier-1/A Production Centre meeting International Requirements 8.Latent Tier-2 resources being incorporated 9.Significant middleware development programme 10.All PP applications using the Grid testbed (open approach)

23 Tony Doyle - University of Glasgow The Challenges Ahead I, II, III: Scale, Complexity, UK Requirements Covered by Steve yesterday..

24 Tony Doyle - University of Glasgow The Challenges Ahead IV: Work Group Computing

25 Tony Doyle - University of Glasgow The Challenges Ahead V: Events.. to Files.. to Events RAW ESD AOD TAG “Interesting Events List” RAW ESD AOD TAG RAW ESD AOD TAG Tier-0(International) Tier-1(National) Tier-2(Regional) Tier-3(Local) Data Files Data Files Data Files TAG Data Files Data Files Data Files RAW Data File Data Files Data Files ESD Data Files Data Files AOD Data Event 1 Event 2 Event 3 VOMS-enhanced Grid certificates to access databases via metadata Non-Trivial..

26 Tony Doyle - University of Glasgow The Challenges Ahead VI: software distribution ATLAS Data Challenge (DC2) this year to validate world-wide computing model Packaging, distribution and installation: Scale: one release build takes 10 hours produces 2.5 GB of files Complexity: 500 packages, Mloc, 100s of developers and 1000s of users –ATLAS collaboration is widely distributed: 140 institutes, all wanting to use the software –needs ‘push-button’ easy installation.. Physics Models Monte Carlo Truth Data MC Raw Data Reconstruction MC Event Summary Data MC Event Tags Detector Simulation Raw Data Reconstruction Data Acquisition Level 3 trigger Trigger Tags Event Summary Data ESD Event Summary Data ESD Event Tags Calibration Data Run Conditions Trigger System Step 1: Monte Carlo Data Challenges Step 1: Monte Carlo Data Challenges Step 2: Real Data

27 Tony Doyle - University of Glasgow Complex workflow… LCG/ARDA Development 1.AliEn (ALICE Grid) provided a pre- Grid implementation [Perl scripts] 2.ARDA provides a framework for PP application middleware The Challenges Ahead VII: distributed analysis

28 Tony Doyle - University of Glasgow Next steps From prototype to production –UK particle physics grid equivalent to 20,000 1GHz personal computers by 2007 –available for day-to-day use by particle physicists –web portal for other e-scientists GridPP will support Enabling Grids for E-science in Europe (EGEE) [startup meeting today] –to integrate national and international grids, and grids from different scientific disciplines –particle physics is a pilot project 2007 – Large Hadron Collider goes live


Download ppt "Tony Doyle GridPP – Making the Grid Work for the Science, ATSE e-Science Visit, Edinburgh, 20 April 2004."

Similar presentations


Ads by Google