Download presentation
Presentation is loading. Please wait.
Published byMarlene Gilmore Modified over 9 years ago
1
EGEE is a project funded by the European Union under contract IST-2003-508833 “The ARDA project: Grid analysis prototypes of the LHC experiments” Massimo Lamanna ARDA Project Leader Massimo.Lamanna@cern.ch RAL, 13 May 2004 www.eu-egee.org http://cern.ch/arda cern.ch/lcg
2
RAL, 13 May 2004 - 2 Contents ARDA Project Mandate and organisation ARDA activities during 2004 General pattern LHCb CMS ATLAS ALICE Conclusions and Outlook
3
RAL, 13 May 2004 - 3 ARDA working group recommendations: our starting point New service decomposition Strong influence of Alien system the Grid system developed by the ALICE experiments and used by a wide scientific community (not only HEP) Role of experience, existing technology… Web service framework Interfacing to existing middleware to enable their use in the experiment frameworks Early deployment of (a series of) prototypes to ensure functionality and coherence EGEE Middleware ARDA project
4
RAL, 13 May 2004 - 4 EGEE and LCG Strong links already established between EDG and LCG. It will continue in the scope of EGEE The core infrastructure of the LCG and EGEE grids will be operated as a single service, and will grow out of LCG service LCG includes many US and Asia partners EGEE includes other sciences Substantial part of infrastructure common to both Parallel production lines as well LCG-2 2004 data challenges Pre production prototype EGEE MW ARDA playground for the LHC experiments EGEE WS MW EGEE-2EGEE-1LCG-2LCG-1 VDT/EDG ARDA
5
RAL, 13 May 2004 - 5 End-to-end prototypes: why? Provide a fast feedback to the EGEE MW development team Avoid uncoordinated evolution of the middleware Coherence between users expectations and final product Experiments ready to benefit from the new MW as soon as possible Frequent snapshots of the middleware available Expose the experiments (and the community in charge of the deployment) to the current evolution of the whole system Experiments system are very complex and still evolving Move forward towards new-generation real systems (analysis!) Prototypes should be exercised with realistic workload and conditions No academic exercises or synthetic demonstrations LHC experiments users absolutely required here!!! EGEE Pilot Application A lot of work (experience and useful software) is involved in current experiments data challenges Concrete starting point Adapt/complete/refactorise the existing: we do not need another system!
6
RAL, 13 May 2004 - 6 The initial prototype will have a reduced scope Components selection for the first prototype Experiments components not in use for the first prototype are not ruled out (and used/selected ones might be replaced later on) Not all use cases/operation modes will be supported Every experiment has a production system (with multiple backends, like PBS, LCG, G2003, NorduGrid, …). We focus on end-user analysis on a EGEE MW based infrastructure Adapt/complete/refactorise the existing experiment (sub)system! Collaborative effort (not a parallel development) Attract and involve users Many users are absolutely required Informal Use Cases are still being defined, e.g.: A physicist selects a data sample (from current Data Challenges) With an example/template as starting point (s)he prepares a job to scan the data The job is split in sub-jobs, dispatched to the Grid, some error-recovery is automatically performed, merged back in a single output The output (histograms, ntuples) is returned together with simple information on the job-end status End-to-end prototypes: how?
7
RAL, 13 May 2004 - 7 ARDA @ Regional Centres “Deployability” is a key factor of MW success A few Regional Centres will have the responsibility to provide early installation for ARDA Understand “Deployability” issues Extend the ARDA test bed The ARDA test bed will be the next step after the most complex “EGEE Middleware” test bed Stress and performance tests could be ideally located outside CERN This is for experiment-specific components (e.g. a Meta Data catalogue) Leverage on Regional Centre local know how Data base technologies Web services … Pilot sites might enlarge the resources available and give fundamental feedback in terms of “deployability” to complement the EGEE SA1 activity (EGEE/LCG operations) Running ARDA pilot installations Experiment data available where the experiment prototype is deployed
8
RAL, 13 May 2004 - 8 Coordination and forum activities The coordination activities would flow naturally from the fact that ARDA will be open to provide demonstration benches Since it is neither necessary nor possible that all projects could be hosted inside the ARDA experiments’ prototypes, some coordination is needed to ensure that new technologies can be exposed to the relevant community Transparent process ARDA should organise a set of regular meetings (one per quarter?) to discuss results, problems, new/alternative solutions and possibly agree on some coherent program of work. The ARDA project leader organises this activity which will be truly distributed and lead by the active partners ARDA is embedded in EGEE NA4 namely NA4-HEP Special relation with LCG GAG LCG forum for Grid requirements and use cases Experiments representatives coincide with the EGEE NA4 experiments representatives ARDA will channel this information to the appropriate recipients ARDA workshop (January 2004 at CERN; open; over 150 participants) ARDA workshop (June 21-23 at CERN; by invitation) “The first 30 days of EGEE middleware” NA4 meeting mid July (NA4/JRA1 and NA4/SA1 sessions foreseen. Organised by M. Lamanna and F. Harris) ARDA workshop (September 2004?; open)
9
RAL, 13 May 2004 - 9 People Massimo Lamanna Birger Koblitz Dietrich Liko Frederik Orellana Derek Feichtinger Andreas Peters Julia Andreeva Juha Herrala Andrew Maier Kuba Moscicki Andrey Demichev Viktor Pose Wei-Long Ueng Tao-Sheng Chen LHCb CMS ATLAS ALICE Russia Taiwan Experiment interfaces Piergiorgio Cerello (ALICE) David Adams (ATLAS) Lucia Silvestris (CMS) Ulrik Egede (LHCb)
10
RAL, 13 May 2004 - 10 Example of activity Existing system as starting point Every experiment has different implementations of the standard services Used mainly in production environments Few expert users Coordinated update and read actions ARDA Interface with the EGEE middleware Verify (help to evolve to) such components to analysis environments –Many users »Robustness –Concurrent “read” actions »Performance One prototype per experiment A Common Application Layer might emerge in future ARDA emphasis is to enable each of the experiment to do its job MilestoneDateDescription 1.x.1May 2004E2E x prototype definition agreed with the experiment 1.x.2September 2004E2E x prototype using basic EGEE middleware 1.x.3November 2004E2E x prototype improved functionality 1.xDecember 2004E2E prototype for experiment x, capable of analysis 2.xDecember 2005E2E prototype for experiment x, capable of analysis and production Very very soon Already started
11
RAL, 13 May 2004 - 11 LHCb The LHCb system within ARDA uses GANGA as principal component (see next slide). The LHCb/GANGA plans: enable physicists (via GANGA) to analyse the data being produced during 2004 for their studies It naturally matches the ARDA mandate Have the prototype where the LHCb data will be the key At the beginning, the emphasis will be to validate the tool focusing on usability, validation of the splitting and merging functionality for users jobs The DIRAC system (LHCb grid system, used mainly in production so far, could be a useful playground to understand the detailed behaviour of some components, like the file catalog)
12
RAL, 13 May 2004 - 12 GANGA Gaudi/Athena aNd Grid Alliance Gaudi/Athena: LHCb/ATLAS frameworks The Athena uses Gaudi as a foundation Single “desktop” for a variety of tasks Help configuring and submitting analysis jobs Keep track of what they have done, hiding completely all technicalities Resource Broker, LSF, PBS, DIRAC, Condor Job registry stored locally or in the roaming profile Automate config/submit/monitor procedures Provide a palette of possible choices and specialized plug-ins (pre-defined application configurations, batch/grid systems, etc.) Friendly user interface (CLI/GUI) is essential GUI Wizard Interface Help users to explore new capabilities Browse job registry Scripting/Command Line Interface Automate frequent tasks python shell embedded into the Ganga GUI GAUDI Program GANGA GUI JobOptions Algorithms Collective & Resource Grid Services Histograms Monitoring Results Grid Services GANGA UI BkSvc Bookkeeping Service WorkLoad Manager SE File catalog WLMProSvcMonitor Internal Model Profile Service GAUDI Program Instr. CE
13
RAL, 13 May 2004 - 13 Integration with EGEE middleware Waiting for the EGEE middleware, we developed an interface to Condor Use of Condor DAGMAN for splitting/merging and error recovery capability Design and Development Command Line Interface Future evolution of Ganga Release management Software process and integration Testing, tagging policies etc. Infrastructure Installation, packaging etc. ARDA contribution to Ganga
14
RAL, 13 May 2004 - 14 LHCb Metadata catalog Used in production (for large productions) Web Service layer being developed (main developers in the UK) Oracle backend ARDA contributes a “testing” focused on the analysis usage Robustness Performances under high concurrency (read mode) Measured network rate vs no. of concurrent clients
15
RAL, 13 May 2004 - 15 Client TAIWAN CERN Oracle DB Bookkeeping Server CPU Load Network Process time Web & XML-RPC Service performance tests CPU Load Network Process time DB I/O Sensor Network monitor Virtual Users CERN/Taiwan tests Clone Bookkeeping DB in Taiwan Install the WS layer Performance Tests Database I/O Sensor Bookkeeping Server performance tests Taiwan/CERN Bookkeeping Server DB XML-RPC Service performance tests CPU Load, Network send/receive sensor, Process time Client Host performance tests CPU Load, Network send/receive sensor, Process time
16
RAL, 13 May 2004 - 16 CMS The CMS system within ARDA is still under discussion Provide easy access (and possibly sharing) of data for the CMS users is a key issue RefDB is the bookkeeping engine to plan and steer the production across different phases (simulation, reconstruction, to some degree into the analysis phase) It contained all necessary information except file physical location (RLS) and info related to the transfer management system (TMDB) The actual mechanism to provide these data to analysis users is under discussion Measuring performances underway (similar philosophy as for the LHCb Metadata catalog measurements) RefDB McRunjob T0 worker nodes GDB castor pool Tapes Export Buffers Transfer agent RLSTMDB Reconstruction instructions Reconstruction jobs Reconstructed data Reconstructed data Checks what has arrived Updates Summaries of successful jobs RefDB in CMS DC04
17
RAL, 13 May 2004 - 17 ATLAS The ATLAS system within ARDA has been agreed ATLAS has a complex strategy for distributed analysis, addressing different area with specific projects (Fast response, user-driven analysis, massive production, etc…: see http://www.usatlas.bnl.gov/ADA/) Starting point is the DIAL system The AMI metadata catalog is a key component mySQL as a back end Genuine Web Server implementation Robustness and performance tests from ARDA In the start up phase, ARDA provided some help in developing ATLAS production tools Being finalised
18
RAL, 13 May 2004 - 18 What is DIAL?
19
RAL, 13 May 2004 - 19 SOAP-Proxy Meta-Data (MySQL) User AMI studies in ARDA Atlas Metadata- Catalogue, contains File Metadata: Simulation/Reconstruction-Version Does not contain physical filenames Many problems still open: Large network traffic overhead due to schema independent tables SOAP proxy supposed to provide DB access Note that Web Services are “stateless” (not automatic handles to have the concept of session, transaction, etc…): 1 query = 1 (full) response Large queries might crashed server Shall proxy re-implement all database functionality? Good collaboration in place with ATLAS- Grenoble Studied behaviour using many concurrent clients:
20
RAL, 13 May 2004 - 20 ALICE: Grid enabled PROOF SuperComputing 2003 (SC2003) Demo USER SESSION PROOF SLAVES TcpRouter PROOF PROOF SLAVES PROOF MASTER SERVER PROOF SLAVES TcpRouter Site A Site C Site B Strategy: The ALICE/ARDA will evolve the analysis system presented by ALICE at SuperComputing 2003 With the new EGEE middleware (at SC2003, AliEn was used) Activity on PROOF Robustness Error recovery
21
RAL, 13 May 2004 - 21 ALICE-ARDA prototype improvements SC2003: The setup was heavily connected with the middleware services Somewhat “inflexible” configuration No chance to use PROOF on federated grids like LCG in AliEn TcpRouter service needs incoming connectivity in each site Libraries can not be distributed using the standard rootd functionality Improvement ideas: Distribute another daemon with ROOT, which replaces the need for a TcpRouter service Connect each slave proofd/rootd via this daemon to two central proofd/rootd master multiplexer daemons, which run together with the proof master Use Grid functionality for daemon start-up and booking policies through a plug-in interface from ROOT Put PROOF/ROOT on top of the grid services Improve on dynamic configuration and error recovery
22
RAL, 13 May 2004 - 22 ALICE-ARDA improved system The remote proof slaves look like a local proof slave on the master machine Booking service is usable also on local clusters PROOF PROOF SLAVE SERVERS Proxy proofdProxy rootd Master Grid Services Booking
23
RAL, 13 May 2004 - 23 Conclusions and Outlook ARDA is starting Main tool: experiment prototypes for analysis Detailed project plan being prepared Good feedback from the LHC experiments Good collaboration with EGEE NA4 Good collaboration with Regional Centres. More help needed Look forward to contribute to the success of EGEE Helping EGEE Middleware to deliver a fully functionally solution ARDA main focus Collaborate with the LHC experiments to set up the end-to-end prototypes Aggressive schedule First milestone for the end-to-end prototypes is Dec 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.