Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org SA3 Report Markus Schulz EGEE-II SA3 Activity Leader IT Department,

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org SA3 Report Markus Schulz EGEE-II SA3 Activity Leader IT Department,"— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org SA3 Report Markus Schulz EGEE-II SA3 Activity Leader IT Department, CERN 1 st EU Review of EGEE-II CERN, 15-16 th May 2007

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 2 Outline Activity Goals Main Achievements Status –Integration and Release Management –Testing –Interoperability –Porting Issues for SA3 Future Plans Summary

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 3 SA3 in Numbers EGEE-II Budget Manpower: 12 partners, 9 countries, 30 FTE

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 4 Activity Goals Manage the process of building middleware distributions –Integrating middleware components from a variety of sources –Define acceptance criteria for accepting components –Test and certify middleware  Ensure: reliability,robustness, scalability, security and usability –Decouple middleware distributions from middleware development –Software selection and priorities to be set by the TCG SA3 is a new activity –Tasks had been covered by SA1 and JRA1 during EGEE

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 5 Tasks Integration and Packaging Testing and Certification –Functional and Stress Testing –Security, Vulnerability Testing –Operate Certification and Testing Test Beds –Project Testing Coordination Debugging, Analysis, Support Interoperation Capture Requirements + Support for porting and contribution to standardization Details of resource allocation can be found in the Execution Plan

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 6 Achievements Integrated release of LCG-2.7 and gLite-1.5 –Different build systems –Different configuration management –Different, overlapping functionality –Different process……  LCG-2 process tailored to production  gLite process tailored to rapid development Released on May 4th –4 days later than planned LCG-2 prototyping product 2004 2005 product gLite 2006 gLite 3.0

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 7 Achievements II Introduced new software lifecycle process –Based on experience with the gLite lifecycle and LCG practice –No “big bang” releases  Components are updated independently –Component updates delivered on a weekly basis to PPS  Every second week to production –Acceptance criteria for new components defined –Clear link between component versions, Patches and Bugs  Semiautomatic release notes production –Clear prioritization by stakeholders  TCG for medium term and EMT for short term goals –Documented in MSA3.2 –In use since July 2006

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 8 Achievements III Test strategy, process,framework and external testbeds –SAM framework for automated testing (SA1 product) –Central repository for tests –Formal follow-up on test development –Increased test cases (depth)  Distributed approach  Development of tests mostly by partners –Formal process of Patch certification –Extended test beds  8 sites roughly 100 nodes  External partners cover additional deployment scenarios –Extensive use of virtualized test beds –Introduced concept of “Experimental Services”  Massive scalability tests can’t be conducted on test infrastructures

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 9 Status Integration and Release Management Preproduction ensures user input and large scale testing TCG prioritization driven by, users, sites, developers, and operations, short term planning via EMT

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 10 Basic Concepts Two distinct entities are tracked by the process Problems and Solutions –Problems = Bugs –Solutions = Bug Fixes = Patches –New features are tracked as “Enhancement”  Missing feature = Problem Process defines for these entities: –States and conditions for state transitions –Roles and responsibilities of actors

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 11 Releases and Updates A software release is a set of packages (baseline) –These packages are continuously updated by Patches The baseline contains a core. –Changes to the core make the release non-backwards compatible  At the software level rather than on the service level –Changes to the core will require a new release –VDT-1.6, globus4, + SL4 as reference OS == gLite-3.1 –VDT-1.2, globus2, + SL3 as reference OS == gLite-3.0 All Patches that pass the Preproduction state at a given date form an update to the release –No fix or enhancement has to wait for other components

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 12 Simplified Process TCGSA3 Software Provider JRA1, VDT,.... New components and major changes Requests changes Endorses SA3 proposals Negotiates with providers Proposal for TCG starts change by Bug creation

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 13 Simplified Process TCGEMT SA3 Integration Software Provider JRA1, VDT,.... SA3 Configuration Prioritization: EMT twice a week TCG every second week Bug and Patch processing Installation tests Functional Tests Patch Specific Tests Scalability Tests Tests on external testbeds SA3 Test Process Continuous, several Bugs and Patches in parallel SA1 PPS Updates and Operates Users Test & Reject Once a week Patches that pass certification move to PPS Rejected Patches SA3 Release Manager Coordinates SA1 Production Service Updates and Operates Every second week Patches are moved to Production Experimental Services Use production service Users Stress tests

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 14 Usage Process is in active use since July 2006 Produced 23 updates to the production system –26 since May 2006 Processed 269 patches –Addressing 835 Bugs

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 15 Statistics gLite-3.0 gLite-3.1LCG-2.7 Ratio between Config. Cert and PPS indicates that change rate is above what SA3 can handle Yaim patches due to merger Patches reflect activity in an area

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 16 Statistics We have to manage a very large number of different node types

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 17 Configuration Management gLite-1 configuration based on XML and Python LCG-2 configuration based on Key-Value pairs + bash –YAIM Site administrators preferred YAIM (result of survey) –Wrappers for gLite components –Process started to move to single layer configuration  FTS, WN, UI, and WMS are already in single layer mode Installation tool –APT for (semi) automatic RPM updates –RPM lists for other tools –Tarballs for Uis and WNs

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 18 Build Systems Currently 3 systems in use LCG build system for legacy components –To be phased out during the year gLite build system –Used for the gLite-3.0 branch ETICS –Used for the gLite-3.1 branch –Migration process to ETICS started in early August  Requires large fraction of SA3 integration resources –Will be finished around August 2007

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 19 Testing Test plans and process documented in MSA3.5 Test strategy –Multi level tests (from simple quick tests, to stress tests)  To abort as early as possible –As much steps in parallel as possible  Component by component Install, configure, functional tests, first patch certification  Requires many temporary testbeds We use virtualization (Xen based) to save time and resources –Automate as much testing as possible  But first ensure coverage –First local then external testbeds –Moved towards testing components against a “Baseline Release”  Required significant reorganization of the testbed operation

20 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 20 Testing Framework We have chosen SAM as our framework for testing –Maintained and used by SA1 –Several tests can be used in certification and production –Tests need very little modification  Concept is compatible with testing in ETICS ---> easy port –Provides Web based, customizable views and history

21 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 21 Test Status Test development mainly by partners –Partners signed up for tests –Progressed monitored and documented every 2 weeks –Steady progress New class of tests: Security testing –Done by Posznan  Code reviews (VOMS and R-GMA)  Penetration tests  Independent testbed Interoperability tests –Not jet integrated in the tests process

22 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 22 Test Status ComponentAvailable testsResponsibleSAM BLAH 5 tests for basic functionality INFNno Batch systems Torque has been tested and scripts developed. Partners should extend these tests to other batch systems GRNET (Torque), INFN (LSF), PIC (Condor), CESGA (SGE) no CE 19 SAM tests CERNyes gLite CE 19 SAM tests & manual test result page CERNyes DGAS 5 tests INFNno DPM 41 tests CERNno FTS 7 tests CERNyes Information System 1 basic test & GIS mon & performance and scalability tests INFN (until April 2007), CERNno LB 4 functionality tests University of Brusselsno LFC 2 SAM tests, 2 API tests, LFC performance test page CERN, LALYes, No MyProxy 1 SAM test CERNyes

23 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 23 Test Status ComponentAvailable testsResponsibleSAM RGMA RGMA test results page, source code auditing TCD, PSNCno RB The old edg-tests suite is used occasionally CERNno SE 3 SAM tests CERNyes SRM v.2 S2 testuite, SRM2 test in DPM testsuite CERNyes UI Extensive test suite testing all commands listed in the LCG User Guide (30+) CERNno WN Most of the UI tests are also applicable to the WN. CERNno VOMS 28 tests, VOMS source code auditing CERN, PSNCyes WMS Tests for: bulk submission, interactive jobs, parametric jobs. Glite version of edg-tests, WMS tests result page CERN, IMPERIAL (since April 2007), CSIC (WMProxy) partly

24 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 24 Test Beds Virtual testbeds for individual testers ( about 5 ) Dynamical allocated test nodes ( > 50 nodes) Central certification testbed

25 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 25 Test Beds External testbeds linked to the certification testbed –CESGA (SGE) –PIC (Condor) –GRNET ( Torque) –UCY (Torque) –INFN (LSF) –LAL (DPM,LFC) –DESY (dcache) Standalone testbeds –Posznan (Security) –IMPERIAL (WMS) –TCD (Porting) Setup and coordination took a long time, last site joined end of 2006.

26 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 26 Interoperability OSG –In production since almost 10 months  Used extensively by CMS –Interoperability testbed in preparation ARC –Problem has been analysed in depth –Plan documented in MSA3.4 –First prototype exists  Still a long way  ARC’s focus is on the CREAM CE

27 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 27 Interoperability UNICORE –Problem has been analysed in depth  Very complex  Minimal overlap between concepts –Plan documented in MSA3.3 –First components exists  Slower progress than expected in the Plan  Proof of principle tests have bee successful NAREGI –Close contact during 2006 –NAREGI demonstrated first set of interoperable tools

28 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 28 Interoperability GIN info –Links information systems of many grid infrastructures  The maps with the grid infrastructures are based on the GIN-BDII Generic Information Provider Provider EGEE Provider OSG Provider NDGF GIN BDII ARC BDII Provider Naregi Provider Teragrid Provider Pragma EGEE Site OSG Site NDGF Site Naregi Grid Teragrid Grid Pragma Grid

29 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 29 Interoperability SA3 is participating actively in the GLUE standardization process Process has been moved to OGF –SA3 member is co-chairing the working group

30 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 30 Porting Main partners are TCD and Posznan Problems with porting –Software dependencies and interdependencies  --->“Plan for glite restructuring” –Up to now only “post release” porting  Difficult to follow change rate  Other platforms have to be supported at release time –TCD is moving to ETICS  Supports better concurrent multi platform build and tests  https://twiki.cern.ch/twiki/bin/view/EGEE/PortingWithEtics https://twiki.cern.ch/twiki/bin/view/EGEE/PortingWithEtics

31 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 31 Porting Status table at TCD: –http://cagraidsvr06.cs.tcd.ie/autobuildhttp://cagraidsvr06.cs.tcd.ie/autobuild

32 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 32 Issues Slow start of the activity –New activity –Recruitment process took until completion 3 months –Several partner required training Merging 2 middleware stacks, tool sets and processes –While keeping changes flowing to production –Was very difficult, done under high pressure by the applications Introducing change while supporting a production service –More than 200 individual updates –How to handle major changes like moving to ETICS? –Current resource level is adequate to support steady state

33 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 33 Issues Testing –Partner contribution started slow  Testbeds needed more than 6 months Hiring, hardware procurements, ++++++++ –Still most tests originate from local team –Introduced more frequent communication  Phone conferences  Formal follow-up on status Interoperability –Underestimated UNICORE interoperation complexity  Review of plan at the next meeting next month –ARC struggled with some technical issues  But mainly a partner issue  Review of plan at the next allhands meeting

34 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 34 Issues Move to ETICS –Will be very beneficial when achieved –Significant upfront investment and training –ETICS is now maturing quickly  But relative timing of both projects was problematic ??????? Or should I ignore the ETICS problems

35 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 35 Plans Complete the move to the ETICS build system –A significant investment for the next 6 months Move install and configuration tests to ETICS Automate more test cases with SAM Move to single layer, component centric configuration tools (component YAIM) –Well underway, in certification Support at least 2 additional platforms for all releases –To be defined by TCG –Can be restricted to some components (Uis, WN) Contribute to the “gLite Restructuring Plan”

36 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 1st EU Review, May 15-16, 2007 36 Summary SA3 got off the ground Integrated LCG-2.7 and gLite-1.5 Defined and implemented Software Life Cycle process –Component based updates work! ( 269 patches since June) Test process defined and implemented –Many additional tests –Common framework with SA1 (SAM) –External testbeds to cover deployment scenarios Move to ETICS is well underway –Will improve portability Interoperation made visible progress –OSG interoperation used on daily basis


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org SA3 Report Markus Schulz EGEE-II SA3 Activity Leader IT Department,"

Similar presentations


Ads by Google