Download presentation
Presentation is loading. Please wait.
Published byTeresa Henderson Modified over 9 years ago
1
EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch
2
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 2 Outline of the presentation u Overview of the objectives for the 2 nd project year, and the corresponding achievements u Activities of funded and unfunded effort u Ongoing work on use cases u Data Challenge work with Atlas and CMS u Comments on the key points of work in the other 4 WP8 experiments u The organisation for D 8.3 ‘Testbed assessment for HEP applications’ u The planning for the 3 rd project year, and some associated issues u QUESTIONS
3
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 3 The objectives for 2 nd project year, and the corresponding achievements OBJECTIVES u Use and exploitation of Testbed1 u Validation of releases + feedback u Participation in the Architecture group (ATF), and the elaboration of use cases ACHIEVEMENTS u Babar and D0 have joined the 4 LHC experiments, and NA48 will soon join. 5 experiments have used the applications testbed. All WP8 experiments have continued to develop their distributed computing infrastructure in Europe and USA u Both EIPs and the experiments have given continual feedback to middleware from both generic and experiment specific evaluations u ATF is very active and execute regular ‘scenario playing’ reviews. Use case documents have been produced and will develop in the context of EDG/LCG
4
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 4 Overview of objectives for 2 nd project year, and the corresponding achievements OBJECTIVES u Design of a common middleware layer for WP8 experiments u Use of EDG middleware in experiment Data Challenges (DCs) u Developments of tutorials and documentation for the user community ACHIEVEMENTS u This has moved into the LHC Computing Grid (LCG) project u Atlas and then CMS experiments have achieved significant pioneering work in the use of EDG middleware for DCs, and in producing detailed evaluations u WP8 has played a substantial role in course design, implementation and delivery
5
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 5 Activities of funded and unfunded effort u WP8 used 51 funded man-months instead of the projected 43.5 (January - November) u Complemented with 350 unfunded man-months from experiments which has largely concentrated on experiment specific activities u The EIP (Experiment Independent Persons) have been involved in n Functionality* and stress testing n Middleware debugging campaigns* n Configuration and testing of Storage Elements and Virtual Organisations* n Data Challenges of the ATLAS and CMS experiments n Organisation of WP8 n Integration Team* and Architectural Task Force * Activities unforeseen in original mandate
6
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 6 Ongoing work on use cases u ‘Common Use Cases for A HEP Common Application Layer’ (HEPCAL) (Document produced for LCG; chaired and largely manned by WP8 people, and only possible thanks to WP8 experience) n General (authorisation,login,browse resources) 4 use cases n Data Management (metadata and data operations) 19 use cases n Job Management (submission,control,monitoring,errors 16 use cases s,resource estimation, job splitting…….) n VO Management (resource reservation,user rights 4 use cases,software publishing…). EDG 1.4.3 satisfies use cases for a basic system(authorisation/authentication,data handling,job submission). EDG 2.0 will satisfy more advanced data handling e.g. (metadata) and HEP data transformation. There are other areas for discussion e.g. virtual data, experiment s/w publishing u This work will continue within EDG and LCG u IN ATF there is regular scenario playing for use cases to check existing and future design
7
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 7 Overview of data challenge work ATLAS (pioneers!) u Specific Goals n Compare results with those obtained without Grid in previous months for ~100 ‘long’ detector simulation jobs n Make prioritised list of recommendations to EDG for bug-fixes and future developments in an evaluation report u Organization n Joint Atlas/EDG/LCG effort u Resources used (and functions) n Sites (CERN,RAL,Lyon,Nikhef,CNAF) + (Karlsruhe) n Several UIs Milan,CERN,Cambridge n RB CERN(shared) n RC Originally shared with CMS. Finally separate one at CNAF CMS u Specific Goals n Aim for as many simulated events as possible for physics analysis, with 1000’s of ‘short’ event generation and ‘long’ detector simulation jobs, using the full production system n Measure performances, efficiencies and reason of job failures to give detailed feedback to middleware in a detailed report u Organization n This was a joint effort involving CMS, EDG, EDT and LCG people u Resources used (and functions) n Sites (CERN,RAL,Lyon,Nikhef,CNAF) + (Legnaro,Padova,Ecol. Poly,I.C) n SeveralUIs CNAF, Padova, Ecol.P,I.C n Several RBs CNAF(CMS),CNAF(shared), CERN(CMS),IC(CMS+Babar) n RC Originally shared with Atlas. Finally separate one at CNAF
8
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 8 History -relating applications work to TB versions VersionDate 1.1.227Feb2002 1.1.302Apr2002 1.1.404Apr2002 1.2.a111Apr2002 1.2.b131May2002 1.2.012Aug2002 1.2.104Sep2002 1.2.209Sep2002 1.2.325Oct2002 1.3.008Nov2002 1.3.119Nov2002 1.3.220Nov2002 1.3.321Nov2002 1.3.425Nov2002 1.4.006Dec2002 1.4.107Jan2003 1.4.209Jan2003 1.4.314Jan2003 RC Changes Mixed Globus 2.0/2.2 RB/JSS Upgrade Known Problems: GASS Cache Coherency Race Conditions in Gatekeeper Unstable MDS Successes Improved MDS Stability FTP Transfers OK Known Problems: Interactions with RC Real Use by Applications! Limitations: Resource Exhaustion Size of Logical Collections Successes Matchmaking/Job Mgt. Basic Data Mgt. Known Problems: High Rate Submissions Long FTP Transfers ATLAS commence phase1 tests CMS start stress tests Nov 30 which continue till Dec 20 Problems with long jobs Instability in MDS Long file transfers unreliable CMS and Atlas evaluate 1.4.3
9
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 9 Atlas evaluations (August and Dec/Jan) (DETAILED PAPER IN PREPARATION) u RESULTS (see Atlas jobs in DEMO tomorrow) n Atlas software was used in the EDG Grid environment n Several hundred simulation jobs of length 4-24 hours were executed, data was replicated using grid tools n Results of simulation agreed with ‘non-Grid’ runs u OBSERVATIONS n Good interaction with EDG middleware providers and with WP6/8 n With a substantial effort it was possible to perform the jobs n Showed up bugs and performance limitations (fixed or to be fixed in EDG 2.0) s WP1 Many ‘Long Jobs’ failed (now much better) s WP2 Replication Tools were difficult to use and unreliable s WP3 Information Service based on MDS gave poor performance (affected WP1) s WP4 We need to separate out application and system software installations (fixed in 1.4.3) n We need EDG 2.0 release for use in large scale data challenges u RECOMMENDATIONS (see combined ATLAS/CMS recommendations…)
10
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 10 SE CE CMS software CMS production components interfaced to EDG middleware (more details in DEMO ) BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS CMS production tools on UI: job creation, job submission and monitoring CMS software RPM-based installed on CEs/WNs Replica Manager CE CMS software CE CMS software CE WN SE CE CMS software SE
11
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 11 Main results and observations from CMS work (detailed doc in preparation) u RESULTS n Could distribute and run CMS s/w in EDG environment n Generated ~250K events for physics with ~10,000 jobs in 3 week period u OBSERVATIONS n Were able to quickly add new sites to provide extra resources n Fast turnaround in bug fixing and installing new software n Test was labour intensive (since software was developing and the overall system was fragile) s WP1 At the start there were serious problems with long jobs- recently improved s WP2 Replication Tools were difficult to use and not reliable, and the performance of the Replica Catalogue was unsatisfactory s WP3 The Information System based on MDS performed poorly with increasing query rate s The system is sensitive to hardware faults and site/system mis-configuration s The user tools for fault diagnosis are limited n EDG 2.0 should fix the major problems (see talks by R Jones and E Laure) providing a system suitable for full integration in distributed production
12
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 12 CMS event production in December 2002 using EDG software and applications TB Nb. of evts time http://cmsdoc.cern.ch/cms/production/www/html/general/index.html
13
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 13 CMS/EDG Summary of Stress Test Preliminary Analysis Short jobs Long jobs After Stress Test – Jan 03
14
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 14 EDG reasons of failure (categories) Preliminary analysis of pre Xmas (1.4.0)
15
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 15 Joint recommendations from Atlas/CMS work u There are essential developments (see EDG 2.0) needed in n Data Management (robustness and functionality) n Information Systems (robustness and scalability) n Workload Management (scalability for high rates, batch submissions,output file specification) n Mass Storage Support (gridified support due in EDG 2.0) u We must maintain and strengthen joint Experiment/EDG work in the evaluation of system components AND the architecture (both will need to evolve – GRID developments are R/D) n Once middleware providers have done their ‘unit tests’ the applications must work with them in the areas of: s Performance evaluation for the user with increasing rates of job submission and data handling, and an expanding TB configuration s Streamlining procedures for feedback to middleware providers u EDG should provide site validation and monitoring procedures u EDG should provide good user tools for fault detection and diagnosis (what is job status?, why did it fail?……..)
16
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 16 Some key points of work in the other experiments u ALICE n developed scripts for the installation of ALICE software on EDG/CEs n developed a WEB interface to automatically submit jobs to the testbed and evaluate its "efficiency" (currently in use) n Current development of the AliEn/EDG interface (included effort from DataTAG) s able to send jobs to EDG via AliEn s Currently completing the tests for registering/accessing data on/from both catalogues (AliEn and EDG), which is required for the interoperability u LHCb n consolidation of basic job submission capability demonstrated at EU review, and at the opening of National E-science Center, Edinburgh, 25 April n made RPMs for LHCb environment n included DataGrid in new LHCb distributed production system (DIRAC) and demonstrated that short DataGrid jobs can be submitted and managed via DIRAC
17
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 17 u Babar n Deployment of the BaBar VO: s VO and RC at Manchester RB at IC s CE/SE/WN at SLAC, In2p3, RAL and Ferrara. n Deployment and adaptation of EDG software at SLAC (the EDG scripts had to be modified for the WN inside the Internet Free Zone) n Successfully tested BaBar analysis and simulation jobs within the EDG framework. n Next step is to run full scale analysis on the Grid. u D0 A D0 replica catalogue and VO server have been set up at Nikhef 124 CPU farm has been successfully used with EDG s/w D0 support was added to the official EDG release, and several sites now support D0 jobs and have installed the RPMs. n Will try the newer release (and true Grid production) when RH 7.2 support appears
18
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 18 The key content for D 8.3 ‘Testbed assessment for HEP applications’ u ‘Datagrid as HEP production environment’ n Detailed evaluations of Atlas and CMS Task Forces n Evaluations by other LHC experiments (Alice,LHCb) n Evaluations from non-LHC experiments (Babar,D0) u Mapping of evaluations to the ‘common use cases’ n General use cases n Data management n Job Management n VO management u Summary of lessons learned for future EDG development, and statement of priorities for the experiments
19
EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 19 The planning for the 3 rd project year, and associated issues u PLANNING n Continue work with experiments using the successful Task Force Model for Data Challenges n Complete D8.3 for end March 2003 (based on release 1.4.3) n Continue architecture work in ATF, and participate to LCG use case/architecture activities n Evaluate EDG 2.0 software, and port it to experiment software environments for use in the data challenges n Complete D8.4 by Dec 2003 (based on release 2.x) u SOME IMPORTANT ISSUES n Must organise detailed test sessions involving experiments and the providers of middleware for information systems, data management and mass storage handling in the context of moving to EDG 2.0 n We look for improved diagnostic information from middleware in case of problems n WP8 will work increasingly with experiments rather than in generic testing, which will taken up by the WP6 Testing Group n We must relate EDG/WP8 work to the use by experiments of the forthcoming LCG Prototype, both in terms of software, hardware and user support n We should re-activate inter-application WG (8+9+10)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.