DataGrid is a project funded by the European Commission under contract IST-2000-25182 GridPP Edinburgh 4 Feb 2004 WP8 HEP Applications Final Project evaluation.

Slides:



Advertisements
Similar presentations
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
EU 2nd Year Review – Feb – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator )
UCL workshop – 4-5 March 2004 – HEP Assessment of EDG – n° 1 HEP Applications Evaluation of the EDG Testbed and Middleware Stephen Burke (EDG HEP Applications.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
5 Sep 2002F Harris Plenary Budapest1 WP8 Report F Harris (Oxford/CERN)
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
LCG EGEE is a project funded by the European Union under contract IST LCG PEB, 7 th June 2004 Prototype Middleware Status Update Frédéric Hemmer.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
UKQCD Grid Status Report GridPP 13 th Collaboration Meeting Durham, 4th—6th July 2005 Dr George Beckett Project Manager, EPCC +44.
DataGrid is a project funded by the European Commission under contract IST F. Harris DataGrid EU Review 19 Feb 2004 WP8 HEP Applications Final.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
Andrew McNab - Manchester HEP - 17 September 2002 UK Testbed Deployment Aim of this talk is to the answer the questions: –“How much of the Testbed has.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Julia Andreeva on behalf of the MND section MND review.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
2 Sep 2002F Harris EDG/WP6 meeeting at Budapest LHC experiments use of EDG Testbed F Harris (Oxford/CERN)
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 The EU DataGrid Project Three years.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
GDB Meeting CERN 09/11/05 EGEE is a project funded by the European Union under contract IST A new LCG VO for GEANT4 Patricia Méndez Lorenzo.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Bob Jones EGEE Technical Director
Workload Management Workpackage
BaBar-Grid Status and Prospects
Eleonora Luppi INFN and University of Ferrara - Italy
EGEE Middleware Activities Overview
EO Applications Parallel Session
INFN-GRID Workshop Bari, October, 26, 2004
UK GridPP Tier-1/A Centre at CLRC
OpenGATE meeting/Grid tutorial, mars 9nd 2005
Bernd Panzer-Steindel CERN/IT
The LHCb Computing Data Challenge DC06
Presentation transcript:

DataGrid is a project funded by the European Commission under contract IST GridPP Edinburgh 4 Feb 2004 WP8 HEP Applications Final Project evaluation of EDG middleware, and summary of workpackage achievements F HARRIS (OXFORD/CERN)

F Harris GridPP Edinburgh 4 Feb n° 2 Outline u Overview of objectives and achievements u Achievements by experiments and experiment independent team u Lessons learned u Summary of the exploitation of WP8 work and future HEP applications activity in LCG/EGEE u Concluding comments u Questions and discussion

F Harris GridPP Edinburgh 4 Feb n° 3 Overview of objectives and achievements OBJECTIVES u Continue work in ATF for internal EDG architecture understanding and development u Reconstitute the Application Working Group(AWG) with the principal aim of developing use cases and defining a common high- level common application layer u Work with LCG/GAG(Grid Applications group) in further refinement of HEP requirements  Developments of tutorials and documentation for the user community u  Evaluate Testbed2, and integrate into experiment tests as appropriate; u  Liase with LCG regarding EDG/LCG integration and the development of LCG1. For example the access of experiments using EDG software to the LCG infrastructure; u Prepare deliverable D 8.4 ready for Dec ACHIEVEMENTS u Active participation by WP8. Walkthroughs of HEP use cases helped to clarify interfacing problems. u This has worked very well. Extension of use cases covering key areas in Biomedicine and Earth Sciences. Basis of first proposal for common application work in EGEE u This group has produced a new HEPCAL use case document concentrating on ‘chaotic’ use of grid by thousands of individual users. In addition further refined the original HEPCAL document u WP8 has played a substantial role in course design, implementation and delivery

F Harris GridPP Edinburgh 4 Feb n° 4 OBJECTIVES  Evaluate EDG Application Testbed, and integrate into experiment tests as appropriate.  Liase with LCG regarding EDG/LCG integration and the development of LCG service.  Continue work with experiments on data challenges throughout the year. ACHIEVEMENTS u Further successful evaluation of 1.4.n throughout summer. Evaluation of 2.0 since October 20. u Very active cooperation. EIPs helped testing of EDG components on LCG Cert TB prior to LCG-1 start late September, and did stress tests on LCG-1. u All 6 experiments have conducted data challenges of different scales throughout 2003 on EDG or LCG flavoured beds in an evolutionary manner.

F Harris GridPP Edinburgh 4 Feb n° 5 Comments on experiment work u Experiments are living in an international multi-grid world using other Grids (US Grid3,Nordugrid..) in addition to EDG and LCG, and have coped with interfacing problems n DataTag project is very important for inter-operability (GLUE schema) u Used EDG software in a number of grids n EDG Application Testbed n LCG Service (LCG-1 evolving to LCG-2) n Italian Grid.it (identical with LCG-1 release) u Having 2 running experiments (in addition to the 4 LHC experiments) involved in the evaluations has proved very positive n BaBar and work on Grid.it n D0 and work on EDG App TB

F Harris GridPP Edinburgh 4 Feb n° 6 Evolution in the use of EDG App TB and the LCG service (and Grid.it) u Mar-June 2003 EDG 1.4 evolved with production use by Atlas, CMS and LHCb with efficiencies ranging from 40 – 90% - (higher in self- managed configurations) s LHCb 300K events Mar/Apr s Atlas 500K events May s CMS 2M events in LCG-0 configuration in summer u Sep LCG-1 service open, (and Grid.it installed LCG-1 release) s Used EDG 2.0 job and data management and partitioned MDS (+VDT) s All experiments have installed software in LCG-1 and accomplished +ve tests s ATLAS,ALICE,BaBar,CMS performed ‘production’ runs on LCG-1/Grid.it u Oct EDG 2.0 was released on App TB Oct 20 n R-GMA instead of ‘partitioned’ MDS n D0 and CMS evaluated ‘monitoring’ features of R-GMA n D0 did ‘production’ running u December 2003 Move to EDG 2.1 on EDG App TB n Fixing known problems n Enhance Data Management u Feb 2004 LCG-2 service release n Enhanced data management n Will include R-GMA for job and network monitoring n All experiments will use it for data challenges

F Harris GridPP Edinburgh 4 Feb n° 7 ALICE evaluation on LCG-1 and Grid.it (September-November 2003) u Summary n Significant improvement in terms of stability with respect to tests in Spring 2003 n Simulation job(event) characteristics s 80K primary particles generated s 12 hr 1GHz CPU s Generated 2 GB data in 8 files n Performance was generally a step function for batches (either close to 0 or close to 100). With long jobs and multi files very sensitive to overall system stability (good days and bad days!) n Jobs were sensitive to space on worker nodes u PLOT of PERFORMANCE with JOB BATCHES IN SEP-NOV u Projected load on LCG1 during ALICE DC(start Feb 2004) when LCG-2 will be used n 10 4 events (1 event/job) n Submit 1 job/3’ (20 jobs/h; 480 jobs/day) n Run 240 jobs in parallel n Generate 1 TB output/day n Test LCG MS n Parallel data analysis (AliEN/PROOF) including LCG

F Harris GridPP Edinburgh 4 Feb n° 8 ATLAS Evolution from 1.4 to LCG-1 and new DC2 production system n Use of EDG (mod for RH7.3) s In May reconstructed 500 K events in 250 jobs with 85% 1 st pass efficiency s With privately managed configuration of 7 sites in Italy, Lyon and Cambridge n LCG-1(+ Grid.it) production s Have simulated events in 150 jobs of 200 events each (the jobs required ~20 hours each with efficiency ~80%) n LCG-2 plans s Start around April n Main features of new DC2 system s Common production database for all of ATLAS s Common ATLAS supervisor run by all facilities/managers s Common data management system a la Magda s Executors developed by middleware experts (LCG, NorduGrid, US). s Final verification of data done by supervisor

F Harris GridPP Edinburgh 4 Feb n° 9 CMS work on LCG-0 and LCG-1 u LCG-0 (summer 2003) n Components from VDT and EDG n GLUE schemas and info providers (DataTAG) n VOMS n RLS n Monitoring: GridICE by DataTAG n R-GMA (for specific monitoring functions) u 14 sites configured and managed by CMS u Physics data produced n 500K Pythia 2000 jobs 8 hr n 1.5M CMSIM 6000 jobs 10 hr. u Comments on performance n Had substantial improvements in efficiency compared to first EDG stress test (~80%) n Networking and site configuration were problems, as was 1 st version of RLS u LCG-1 n Ran for 9 days on LCG-1 (20- 23/12/2003 – 7-12/1/2004) n In total 600,000 events (30-40 hr jobs) were produced (ORCA) using UIs in Padova and Bari n Sites used were mainly in Italy and Spain n Efficiency around 75% over Xmas hols (pretty pleased!) n Used GENIUS portal u LCG-2 n Will be used within data challenge starting in March

F Harris GridPP Edinburgh 4 Feb n° 10 LHCb results and status on EDG and LCG u Tests on the EDG1.4 application testbed (Feb 2003): n Standard LHCb production tasks, 300K events produced; n ~35% success rate. (TB support running down) s Software installation by the running job; u EDG2.0 tests (November 2003): n Simple test jobs as well as standard LHCb production tasks: s 200 events per job, ~10 hours of CPU; n Submission of the jobs: s To EDG RB; s Directly to a CE with the CE status information obtained from the CE GRIS server: n This increased significantly the stability of the system up to 90% success rate. u LCG tests(now): n Software installation/validation either centralized or in a special LHCb job; n The basic functionality tests are OK: s LHCb production jobs with the output stored in a SE and registered in the RLS catalog; s Subsequent analysis jobs with input taken from previously stored files. n Scalability tests with large number of jobs are prepared to be run on the LCG-2 platform.

F Harris GridPP Edinburgh 4 Feb n° 11 BaBar work on EDG and Grid.it u Integrating grid tools in an experiment running since 1999 n Hope to have access to more resources via grid paradigm (e.g. Grid.it) n Have used grid tools for data distribution (SLAC to IN2P3) u Strategy for first integration n Created ‘simulation’ RPM to be installed at sites n Data output stored on closest SE n Data copied to Tier-1 or SLAC using edg-copy n Logfiles via sandbox u Scheme first tested with EDG on 5 Italian sites u Operation on Grid.it with LCG-1 release n RB at CNAF n Farms at 8 Italian sites n 1 week test with ~ 500 jobs n 95% success at Ferrara(site with central DB) n 60% success elsewhere s 33% failures due to network saturation due to simultaneous requests to remote Objectivity database (looking at distributed solutions) u Positive experience with use of GENIUS portal n u Analysis applications have been successfully tested on EDG App TB n input through sandbox n executable from Close SE n read Objy or Root data from local server n produce n-tuple and store to Close SE n send back output log file through sandbox

F Harris GridPP Edinburgh 4 Feb n° 12 D0 evaluations on EDG App TB u Ran part of D0 Run 2 re-processing on EDG resources n Job requirements s Input-file is 700 Mbyte (2800 events) s 1 event is 25 GHz.sec/event (Pentium series assumed) s Job takes 19.4 hours on a 1 GHz machine n Frequent software updates so don’t use RPMs s Registered compressed tar archives in RLS as grid files s Jobs retrieve and install them before starting execution n Use RGMA for monitoring s Allows users and programs to publish information for inspection by other users, and for archiving in production database s Found it invaluable for matching output files to jobs in real production with some jobs failing u Results n Found EDG s/w generally satisfactory for task (with caveats) s Had to modify some EDG RM commands due to their (Globus) dependence on inbound IP connectivity. Should be fixed in 2.1.x s Used ‘Classic’ SE s/w while waiting for mass storage developments s Very sensitive to RGMA instabilities. Recently good progress with RGMA, and can run at ~90% efficiency when RGMA is up

F Harris GridPP Edinburgh 4 Feb n° 13 Summary of middleware evaluations u Workload management n Tests have shown that software is more robust and scalable s Stress tests were successful with up to 1600 jobs in multiple streams – efficiencies over 90% s Problems with new sites during tests– VOs not set up properly (though site accepted job) u Data Management n Has worked well with respect to functionality and scalability (have registered ~100K files in ongoing tests) s Tests so far with only 1 LRC per VO implemented n Performance needs enhancement s Registrations and simple queries can take up to 10 seconds n We have lost (with GDMP) bulk transfer functions n Some functions needed inbound IP connectivity (Globus). D0 had to program round this (problem since fixed)

F Harris GridPP Edinburgh 4 Feb n° 14 Summary of middleware evaluations(2) u Information System n Partitioned MDS has worked well for LCG following on from work accomplished within EDG (BD II work)(? Limit 100 sites) n R-GMA work is very promising for ‘life after MDS’, but needs ‘hardening’. s Ongoing work using D0 requirements for testing - How will this continue in LCG/EGEE framework? u Mass Storage support(crucial for data challenges) n We await ‘accepted’ uniform interface to disk and tape systems s Solution coming this week(?) with SRM/GFAL software s WP5 have made important contribution to the development of SRM interface n EDG 2.0 had mass storage access to CERN (Castor) and RAL(ADS) s The ‘Classic-SE’ has been a useful fallback (gridftp server) while waiting for commissioning of developments

F Harris GridPP Edinburgh 4 Feb n° 15 Other important grid system management issues u Site Certification n Must have official, standard procedure as part of release n Tests are intimately tied in with what is published in information service, so relevant information must be checked n Tests should be run on a regular basis and sites failing should be put out automatically u Site Configuration n The configuration of software is very complex to do manually n Only way to check it has been to run jobs, pray and then look at errors n Could not middleware providers provide fewer parameters and provide defaults? And also check parameters while running? u Space management and publishing n Running out of space on SEs and WNs is still a problem. Jobs need to check availability before running

F Harris GridPP Edinburgh 4 Feb n° 16 The Deliverables + ‘extra’ outputs from WP8 u The formal EU deliverables n D8.1 The original HEP requirements document n D8.2 ‘Evaluation by experiments after 1 st year’ n D8.3 ‘Evaluation by experiments after 2 nd year’ n D8.4 ‘Evaluation after 3 rd year’ u Extra key documents (being used as input to EGEE) n HEPCAL Use cases May 2002 (revised Oct 2003) n AWG Recommendations for middleware (June 2003) n AWG Enhanced use cases (for Biomed,ESA) Sep 2003 n HEPCAL2 Use cases for analysis (several WP8 people) u Generic HEP test suite used by EDG/LCG u Interfacing of 6 experiment systems to middleware u Ongoing consultancy from ‘loose cannons’ to all applications

F Harris GridPP Edinburgh 4 Feb n° 17 Main lessons learned u Globally HEP applications feel it would have been ‘better’ to start with simpler prototype software and develop incrementally in functionality and complexity with close working relations with middleware developers u Applications should have played larger role in architecture in defining interfaces(so we could all learn together!). ‘Average’ user interfaces are at too low a level. Defining high level APIs from beginning would have been very useful u Formation of Task Forces(applications+middleware) was a very important step midway in project u Loose Cannons(team of 5) were crucial to all developments. Worked across experiments. This team comprised all the funded effort of WP8. u Information system is nerve centre of grid. We look to R-GMA developments for long term solution to scaling problems (LCG think partitioned MDS could work up to 100 sites) u We look to SRM/GFAL as solution to uniform mass storage interfacing u Must have flexible application s/w installation not coupled to grid s/w installation (experiments working with LCG. See work of LHCb, ALICE, D0..) u We observe that site configuration, site certification and space management on SEs and WNs are still outstanding problems that account for most of the inefficiencies (down times)

F Harris GridPP Edinburgh 4 Feb n° 18 Exploitation of the work of WP8, and future HEP applications work in LCG/EGEE u All experiments have exploited the EDG middleware using WP8 effort u The HEPCAL and AWG documents are essential inputs to the future LCG/EGEE work u Future developments will be in the context of LCG/EGEE infrastructure n There will be a natural carrying over of experience and personnel within the HEP community u Experiments will use LCG/EGEE for data challenges in 2004 n ALICE now CMS Mar 1 Atlas and LHCb April u In parallel within the ARDA project middleware will be ‘hardened’ (including EDG components) and evaluated in development TB u The NA4 activity in EGEE will include dedicated people for interfacing middleware to experiments (8 people at CERN + others distributed in the community)

F Harris GridPP Edinburgh 4 Feb n° 19 Concluding comments u Over the past 3 years HEP community has moved from an R/D approach to the use of the grid paradigm, to the use of grid services in physics production systems in world-wide configurations u Experiments are using several managed grids (LCG/EGEE,US Grids, Nordugrid) so inter-operability is crucial u Existing software can be used in production services, with parallel ‘hardening’ taking advantage of lessons learned (the ARDA project in LCG/EGEE) u THANKS n To the members of WP8 for all their efforts over 3 years n And to all the other WPs (middleware,testbed,networking,project office) for their full support and cooperation

F Harris GridPP Edinburgh 4 Feb n° 20 Questions and discussion