DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – HEP Assessment of EDG – n° 1 HEP Applications Evaluation of the EDG Testbed.

Slides:



Advertisements
Similar presentations
EU DataGrid progress Fabrizio Gagliardi EDG Project Leader
Advertisements

30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Stephen Burke - WP8 Status - 9/5/2002 Partner Logo WP8 Status Stephen Burke, PPARC/RAL.
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
Stephen Burke - WP8 Status - 14/2/2002 Partner Logo WP8 Status Stephen Burke, PPARC/RAL.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Partner Logo UK GridPP Testbed Rollout John Gordon GridPP 3rd Collaboration Meeting Cambridge 15th February 2002.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Presenter Name Facility Name EDG Testbed Status Moving to Testbed Two.
EGEE is a project funded by the European Union under contract IST EGEE Middleware Requirements Are we happy with HEPCAL? Stephen Burke, CCLRC.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
EU 2nd Year Review – Feb – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator )
UCL workshop – 4-5 March 2004 – HEP Assessment of EDG – n° 1 HEP Applications Evaluation of the EDG Testbed and Middleware Stephen Burke (EDG HEP Applications.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
LCG EGEE is a project funded by the European Union under contract IST LCG PEB, 7 th June 2004 Prototype Middleware Status Update Frédéric Hemmer.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
DataGrid is a project funded by the European Commission under contract IST F. Harris DataGrid EU Review 19 Feb 2004 WP8 HEP Applications Final.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 The EU DataGrid Project Three years.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
DataGrid is a project funded by the European Commission CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 1 Under contract IST HEP Applications.
The EDG Testbed Deployment Details
EGEE Middleware Activities Overview
U.S. ATLAS Grid Production Experience
Moving the LHCb Monte Carlo production system to the GRID
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
LCG middleware and LHC experiments ARDA project
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Gridifying the LHCb Monte Carlo production system
The LHCb Computing Data Challenge DC06
Presentation transcript:

DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 1 HEP Applications Evaluation of the EDG Testbed and Middleware Stephen Burke (EDG HEP Applications – WP8)

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 2 Talk Outline u The organisation and mission of EDG/WP8 u Overview of the evolution of the EDG Applications Testbed u Overview of the Task Force activities with HEP experiments and their accomplishments u Use case analysis mapping HEPCAL to EDG u Main lessons learned and recommendations for future projects u A forward look to EDG release 2 and co-working EDG/LCG Authors I.Augustin, F.Carminati, J.Closier, E.van.Herwijnen CERN J.J.Blaising, D.Boutigny, A.Tsaregorodtsev CNRS (France) K.Bos, J.Templon NIKHEF (Holland) S.Burke, F.Harris PPARC (UK) R.Barbera, P.Capiluppi, P.Cerello, L.Perini, M.Reale, S.Resconi, A.Sciaba, INFN (Italy) M.Sitta O.Smirnova Lund (Sweden)

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 3 EDG Structure u Six middleware work areas: n Job submission/control n Data management n Information and monitoring n Fabric management n Mass Storage n Networking u Three-year project, u Current release 1.4, expect 2.0 in May u Three application groups: n HEP (WP8) n Earth observation n Biomedical u Testbed operation

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 4 Mission and Organisation of WP8 u To capture the requirements of the experiments, to assist in interfacing experiment software to EDG middleware, to evaluate functionality and performance, and give feedback to middleware developers and report to the EU n Also involved in generic testing, architecture, education etc. u 5 Loose Cannons … full time people helping all experiments (were key members of the Task Forces for ATLAS and CMS) u 2-3 representatives from each experiment n ALICE, ATLAS, CMS, LHCb n BaBar, D0 (since Sep 2002) u Most recent report Evaluation of Datagrid Application Testbed during Project Year 2 available now (EU deliverable D8.3) n This talk summarises the key points of the report

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 5 The EDG Applications Testbed u The testbed has been running continuously since November 2001 u Five core sites: CERN, CNAF, Lyon, NIKHEF, RAL n Now growing rapidly: currently around 15 sites, 900 CPUs, 10 Tb of disk in Storage Elements (plus local storage). Also Mass Storage Systems at CERN, Lyon, RAL and SARA. u Key dates: n Feb 2002, release 1.1 with basic functionality n April 2002, release 1.2: first production release, used for ATLAS tests in August n Nov Feb 2003, release 1.3/1.4: bug fixes incorporating a new Globus version, stability much improved. Used for CMS stress test, and recently ALICE and LHCb production tests. n May 2003, release 2.0 expected with major new functionality, application testbed expected to merge with LCG 1 (the LHC Computing Grid testbed)

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 6 Middleware in Testbed 1 u Basic Globus services: GridFTP, MDS, Replica Catalog, job manager u Job submission: submit a job script to a Resource Broker which dispatches the job to a suitable site, using matchmaking information published in MDS. Uses Condor-G. u Data management: tools to copy files with GridFTP and register them in the Replica Catalogs. Interfaced with the job submission tools to allow jobs to be steered to sites with access to their input files. u Fabric management: allows automated configuration and update of Grid clusters u VO management: VO membership lists in LDAP servers, users are mapped to anonymous VO-based pool accounts

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 7 Resumé of experiment DC use of EDG-see experiment talks elsewhere at CHEP u ATLAS were first, in August The aim was to repeat part of the Data Challenge. Found two serious problems which were fixed in 1.3 u CMS stress test production Nov-Dec 2002 – found more problems in area of job submission and RC handling – led to 1.4.x u ALICE started on Mar 4: production of 5,000 central Pb-Pb events - 9 TB; 40,000 output files; 120k CPU hours n Progressing with similar efficiency levels to CMS n About 5% done by Mar 14 n Pull architecture u LHCb started mid Feb n ~70K events for physics n Like ALICE, using a pull architecture u BaBar/D0 n Have so far done small scale tests n Larger scale planned with EDG 2 No. of evts – 250k Time – 21 days

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 8 Use Case Analysis u EDG release 1.4 has been evaluated against the HEPCAL Use Cases u Of the 43 Use Cases: n 6 are fully implemented n 12 are largely satisfied, but with some restrictions or complications n 9 are partially implemented, but have significant missing features n 16 are not implemented u Missing functionality is mainly in: n Virtual data (not considered by EDG) n Authorisation, job control and optimisation (expected for release 2) n Metadata catalogues (some support from middleware, needs discussion between experiments and developers about usage)

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 9 Lessons Learnt - General Many problems and limitations found, but also a lot of progress. We have a very good relationship with the middleware and testbed groups. u Having real users on an operating testbed on a fairly large scale is vital – many problems emerged which had not been seen in local testing. u Problems with configuration are at least as important as bugs - integrating the middleware into a working system takes as long as writing it! u Grids need different ways of thinking by users and system managers. A job must run anywhere it lands. Sites are not uniform so jobs should make as few demands as possible.

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 10 Job Submission u Limitations with the versions of Globus and Condor-G used: n 512 concurrent jobs /Resource Broker n Max submission rates of 1000 jobs/hr n Max 20 concurrent users u Worked with multiple brokers and within rate limits u Can be very sensitive to poor or incorrect information from Information Providers, or propagation delays n Resource discovery may not work n Resource ranking algorithms are error prone n Black holes – all jobs go to RAL u The job submission chain is complex and fragile u Single jobs only, no splitting, dependencies or checkpointing

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 11 Information Systems and Monitoring u It has not been possible to arrive at a stable, hierarchical, dynamic system based on MDS n System jammed up with increasing query rate and hence could not give reliable information to clients such as RB n Used workarounds (static list of sites, then a fixed database LDAP backend). Works, but not really satisfactory. New R-GMA software due in release 2. u Monitoring/debugging information is limited so far u We need to develop a set of monitoring tools for all system aspects

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 12 Data Management u Replica Catalog n Jammed with many concurrent accesses n With long file names (~ bytes) there was a practical limit of ~2000 entries n Hard to use more than one catalogue in the current system n No consistency checking against disk content n Single point of failure n New distributed catalogue (RLS) in release 2 u Replica Management n Copying was error prone with long (1-2 Gb) files (~90% efficiency) n Fault tolerance is important: error conditions should leave things in a consistent state

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 13 Use of Mass Storage (CASTOR, HPSS) u EDG uses GridFTP for file replication, CASTOR and HPSS use RFIO u Interim solution: n Disk file names have a static mapping to the MSS n Replica Management commands can stage files to and from MSS n No disk space management u Good enough for now, but a better long-term solution is needed n A GridFTP interface to CASTOR is now available n The EDG MSS solution (Storage Element) will be available in release 2

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 14 Virtual Organisation (VO) Management u Current system works fairly well, but has many limitations u VO servers are a single point of failure u No authorisation/accounting/auditing/… n EDG has no security work package! (but there is a security group) n New software (VOMS/LCAS) in release 2 u Experiments will also need to gain experience about how a VO should be run

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 15 User View of the Testbed u Site configuration is very complex, there is usually one way to get it right and many ways to be wrong! LCFG (the fabric management system) is a big help in ensuring uniform configuration, but cant be used at all sites. u Services should fail gracefully when they hit resource limits. The Grid must be robust against failures and misconfiguration. u We need a user support system. Currently mainly done via the integration team mailing list, which is not ideal! u Many HEP experiments (and the EDG middleware at the moment) require outbound IP connectivity from worker nodes, but many farms these days have Internet Free Zones. Various solutions are possible, discussion is needed.

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 16 Other Issues u Documentation – EDG has thousands of pages of documentation, but it can be very hard to find what you want u Development of user-friendly portals to services n Several projects underway, but no standard approach yet u How do we distribute the experiment application software? u Interoperability – is it possible to use multiple Grids operated by different organisations? u Scaling – can we make a working Grid with hundreds of sites?

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 17 A Forward Look to EDG 2/LCG 1 u Release 2 will have major new functionality: n New Globus/Condor releases via VDT, including GLUE schema s Use of VDT + GLUE is a major step forwards in US/Europe inter-operability n Resource Broker re-engineered for improved stability and throughput n New R-GMA information and monitoring system to replace MDS n New Storage Element manager n New Replica Manager/Optimiser with a distributed Replica Catalogue u From July 2003 the EDG Application Testbed will be synonymous with the LCG-1 prototype u EDG/WP8 are now working together with LCG in several areas – requirements, testing, interfacing experiments to LCG middleware (EDG 2 + VDT)

CHEP 2003 – March 2003 – HEP Assessment of EDG – n° 18 Summary & Future Work u The past year has seen major progress in the use by the experiments of EDG middleware for physics production on an expanding testbed - pioneering tests by ATLAS + real production by CMS, and now ALICE and LHCb n WP8 has formed the vital bridge between users and middleware, both through generic testing and by involvement in Task Forces n And has been a key factor in the move from R/D to a production culture u There is strong strong interest from running experiments (BaBar and D0) to use EDG middleware u We have had excellent working relations with the Testbed and Middleware groups in EDG, and this is continuing into LCG u We foresee intense testing of the LCG middleware combining efforts from LCG, EDG and the experiments, and also in user support and education activities u ACKNOWLEDGEMENTS n Thanks to the the EU and our national funding agencies for their support of this work