EGEE is a project funded by the European Union under contract IST “ARDA input for the testing coordination meeting” Massimo Lamanna CERN, 7 September cern.ch/lcg
LCG ARDA status Massimo Lamanna 2 TOC ARDA in a nutshell Testing meeting ARDA and testing Main Risk Examples Summary
LCG ARDA status Massimo Lamanna 3 ARDA in a nutshell ARDA is an LCG project whose main activity is to enable LHC analysis on the grid ARDA is coherently contributing to EGEE NA4 (using the entire CERN NA4-HEP resource) Use the grid software as it matures (EGEE project) ARDA should be the key player in the evolution from LCG2 to the EGEE infrastructure Provide early and continuous feedback (guarantee the software is what experiments expect/need) Use the last years experience/components both from Grid projects (LCG, VDT, EDG) and experiments middleware/tools (Alien, Dirac, GAE, Octopus, Ganga, Dial,…) Help in adapting/interfacing (direct help within the experiments) Every experiment has different implementations of the standard services, but: Used mainly in production environments –Few expert users –Coordinated update and read actions ARDA Interface with the EGEE middleware Verify (help to evolve to) such components to analysis environments –Many users (Robustness might be an issue) –Concurrent “read” actions (Performance will be more and more an issue) One prototype per experiment A Common Application Layer might emerge in future ARDA emphasis is to enable each of the experiment to do its job About 2 FTEs per prototype Provide a forum for discussion Comparison on results/experience/ideas Interaction with other projects … Experiment interfaces: Piergiorgio Cerello (ALICE) David Adams (ATLAS) Lucia Silvestris (CMS) Ulrik Egede (LHCb) The experiment interfaces agree with the ARDA project leader the work plan and coordinate the activity on the experiment side (users)
LCG ARDA status Massimo Lamanna 4 Status ALICE Grid activity Use of the Glite testbed Access system to gLite services being developed (Demo available in June 2004) This is the key layer to allow ALICE software to be effectively use their prototype (evolution of their 2003 system, using gLite and PROOF) Other contributions The access system is a generic piece of software (plans to be used in ATLAS and metadata access) Tests of the metadata capabilities of the gLite file catalogue ALICE IO (aiod) glite IO ATLAS Grid Activity Use of the Glite testbed DIAL on gLite OK (Evolution of the DIAL demo) ATHENA to gLite OK. Ready to expose to test users. First skeleton of high level services Detailed studies of the Don Quijote system (ATLAS data management system) Other contributions: Detailed collaboration on AMI database (performance, support Oracle implementation,…) Production and Combined Test Beam contributions CMS The CMS system within ARDA not 100% defined Grid and other contributions: Use of the Glite testbed Successful ORCA job submission to Glite. Investigating with the package manager Access to files directly from CASTOR Glite file catalog tests RefDB studies and evolution LHCb Grid activity: Use of the Glite testbed “Regular” DaVinci jobs onto Glite exposed to test users (outside the ARDA team) DaVinci jobs from Ganga to Glite Other contributions: GANGA release mgt and software process (CVS,…) Contributions to DIRAC Metadata catalogue (Perform. test also in Taiwan)
LCG ARDA status Massimo Lamanna 5 Questions for this meeting Test case definitions : NA4 and JRA1 have been doing some work on this. Can we use a common format and definition? Are common templates feasible? Would people prefer to use/manage their own? Can we work together on definitions of test suites for the various gLite components Common CVS and test RPMS Is there any interest in using a common CVS repository for test cases and test case definitions? Will other teams maintain a separate CVS? Should we try to build RPMs containing test scripts from all teams, or Do people want to produce their own independent tests and package as they like? Application specific code should not be part of this test code - only middleware specific test code. Tools and frameworks (questions for all to answer): JRA1 will present what they have evaluated so far and where they are We would like to know if anyone else is using anything or if there is interest in using a common framework to manage testing. Should we try to standardize/agree on a common output format for tests to present the results of all testing in a common framework Perhaps the different testing activities should all remain fairly independent but if we agree on a reporting format then many test suites from the different activities could be run within one framework Explore solutions/tools for asynchronous testing How do we work together? What are peoples' expectations? Requirements, desires? ARDA in particular, how do they see themselves working with us? Languages Propose no enforcement - wouldn't work test team will endeavour to use wsdl definitions and write python scripts but we will not always have wsdl, sometimes only CLI or C/C++ apis output format for reporting is where we could come up with a common definition Test reporting and results Do we want a common reporting format? What do others have already JRA1 should propose a common reporting/coverage format Test writing Do we want to try to divide the work, i.e. different teams look at different types of tests Are we happy to have different teams doing overlapping work? Prioritization of testsuites
LCG ARDA status Massimo Lamanna 6 ARDA and testing Disclaimer: LCG and NA4-HEP oriented Other points of view (Biomed, Generic) will be presented by our colleagues Frequently our views are remarkably homogeneous (NA4 in Catania) ARDA is *not* doing testing (in the sense of JRA1 testing) Big effort to create “examples” which could be given to physicist as starting point for realistic data access Output: Snapshot(s) available Clearly the software lives within a concrete infrastructure which should perform as well Also experiment components are “used under stress in controlled conditions” (for example: ) ARDA is very interested on testing because of many reasons: ARDA is there to help to have gLite accepted by the LHC experiments Effective testing in JRA1 is a key ingredient (through the whole chain) but it is a “development” activity ARDA is part of the NA4 activity: The NA4 testing team is a good opportunity to streamline some activities Coordination is needed (this meeting is just perfect!)
LCG ARDA status Massimo Lamanna 7 Main risk (NA4 testing) Clearly, the risks are for EGEE, not for NA4 itself The NA4 testing team is a good opportunity to Streamline some activities complementing/profiting from existing effort Strengthen important areas which might be weak Coordination is needed This meeting is just perfect to avoid repetition (obvious) Coordination rhymes with collaboration… Strong suggestion to go for common existing widely used solutions! Evaluate the cost/benefit of different options –Not all weak areas can be covered in the next 18 months! –On the other hand, what we build should possibly last longer than that! Mantainability issues (well) beyond EGEE phase 1 Important goal –The goal (of EGEE, not only of ARDA) is to go for a great middleware running on a dependable large infrastructure serving many user communities (LHC, Biomed + many others) No-goal –Too complex formal structures can be a black hole of resources (wasted resources) –Even worse: we should not generate effort to enable more (coordinated) effort on testing
LCG ARDA status Massimo Lamanna 8 Examples (1) NA4 requirements HEPCALs documents (Use Cases) the starting point Users are/will be the key! ARDA shares requirements with the rest of NA4 But, different requirements being emphasized: E.g. *Automatic* parallel submission is *not* a requirement for us: this functionality is taken care within the experiments’ framework (cfr. DIAL, GANGA, …) Test bed philosophy: Data access example: The SE we have got at the beginning was no good at all… … because we had no access to the experiment data store What could be desirable for testing, is not interesting for us Now we are happily testing data access from the experiments’ repositories
LCG ARDA status Massimo Lamanna 9 Examples (2) Testing infrastructure Different requirements on n_sites and n_CPU(site) Not all sites are equivalent for ARDA –Complexity in a testbed is OK –But where are the experiment data? –Testing and ARDA complementary here N_CPU cannot be 2 –The users should be exposed to resources ~ a normal CERN LSF user –ARDA specific Testing infrastructure ARDA uses the prototype (excellent experience so far) Move towards (pre)production services as soon as possible A synthetic testbed is not interesting for us The JRA1 testbed should not be replicated All the available effort (hardware, system admin know how and effort) should be focused to offer a playground for (early) users
LCG ARDA status Massimo Lamanna 10 Summary ARDA is *not* doing testing NA4 testing team has a great potential Complement/streamline experience from usage more than from formal translation of requirements in test suites Testing activities should be coordinated Coordination within NA4 Coordination across the project (Long term) issues –Maintainability (well beyond phase 1) –Standard practices and tools (EGEE SA1 (LCG-GDA), EGEE JRA2 (LCG SPI)) What is the best for EGEE?