Download presentation
Presentation is loading. Please wait.
Published byJayson Spencer Modified over 9 years ago
1
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org gLite and Condor present and future Claudio Grandi (INFN – Bologna) Condor Week June 28 th, 2006 Milano, Italy
2
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 2 Outline The EGEE Project The gLite Middleware and the software process gLite and Condor Summary
3
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 3 The EGEE project EGEE –1 April 2004 – 31 March 2006 –71 partners in 27 countries, federated in regional Grids EGEE-II –1 April 2006 – 31 March 2008 –91 partners in 32 countries –13 Federations Objectives –Large-scale, production-quality infrastructure for e-Science –Attracting new resources and users from industry as well as science –Improving and maintaining “gLite” Grid middleware
4
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 4 Related EU projects
5
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 5 EGEE Infrastructure Steady growth of infrastructure and usage Improved reliability of sites (typically over 80%) Operational and support procedures in place Interoperability and interoperation with related projects world-wide Steady growth of infrastructure and usage Improved reliability of sites (typically over 80%) Operational and support procedures in place Interoperability and interoperation with related projects world-wide sites CPU 0 5000 10000 15000 20000 25000 30000 35000 Jan- 05 Feb- 05 Mar- 05 Apr-05May- 05 Jun- 05 Jul-05Aug- 05 Sep- 05 Oct-05Nov- 05 Dec- 05 Jan- 06 Feb- 06 Mar- 06 Apr-06 Jobs / day Jobs
6
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 6 EGEE Middleware: gLite gLite is a comprehensive middleware stack –Developed according a well defined process, tested and documented Service Oriented Architecture –Lightweight services –Allow for multiple interoperable implementations –Easily and quickly deployable Use existing services where possible –Condor, EDG, Globus, LCG,... Portable(?) –SL3-32, in future SL4-32 and SL4-64, IPv6 Security –Considered for both applications and deployment sites Performance/Scalability & Resilience/Fault Tolerance –Comparable to deployed infrastructure Co-existence with other deployed infrastructure –eg. Interoperability with OSG and NAREGI Open source (Apache) license Important role of the Design Team in the definition of the functionalities –With significant contributions by the Globus and Condor teams LCG-2 prototyping product 2004 2005 product gLite 2006 gLite 3.0
7
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 7 SA3 Testing & Certification Functional Tests Testbed Deployment gLite Software Process JRA1 Development Software Error Fixing SA3 Integration Deployment Packages Integration Tests Installation Guide, Release Notes, etc SA1 Pre- Production Scalability Tests Pre-Production Deployment Fail Pass SA1 Production Infrastructure Release Problem Serious problem Directives
8
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 8 gLite Software Process Technical Coordination Group (TCG) –gathers & prioritizes user requirements –from HEP, biomed, (industry), sites –gLite development is cliend-driven! Software from EGEE-JRA1 and other projects SA3 Integration Team –Ensures components are deployable and work –Currently 224 modules –Deployment Modules implemented high-level gLite node types (WMS, CE, R-GMA Server, VOMS Server, FTS, etc) –Build system now spun off into the ETICS project (Jan 2006) SA3 Certification Team –Merge of the JRA1 testing and SA1 certification teams –Dedicated testbed –Develop test suites –Test release candidates and patches SA1 Pre-Production System –First exposure of the middleware to the users
9
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 9 Middleware structure Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory Foundation Grid Middleware will be deployed on the EGEE infrastructure –Must be complete and robust –Should allow interoperation with other major grid infrastructures –Should not assume the use of Higher-Level Grid Services Foundation Grid Middleware Security model and infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring Higher-Level Grid Services Workload Management Replica Management Visualization Workflow Grid Economies... Applications
10
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 10 gLite and Condor History stretches to DataGrid WP1 and Condor-G –Provided language for expressing job description –Proper framework for match-making (“new” classads) –Execute jobs on GRAM-accessible resources, via Condor-G –Provide L&B (or accounting) information about jobs –Be community match-making, local job information database Present, EGEE/EGEE-II and Condor –EGEE Design Team includes reps from MW providers (AliEn, Condor, Globus...) –Wisconsin is one of the development prototype sites Uses: Condor pool as backend; Globus RLS –We use the VDT distribution of Condor and Globus The Collaboration Continues!
11
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 11 Condor-C in gLite WMS Extend the practice of reliable job transfer Extend the guarantees of once and only once execution Condor-C
12
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 12 Condor-C in gLite CE Need set of Condor-C daemons per {submitting node / user DN / user VO} triplet Run as VO user,submit jobs via sudo service to batch system One set of daemons switching UID via glexec/LCMAPS Apart from that, it’s (on-going at a steady rate) bugfixing..
13
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 13 Condor batch system - accounting Currently the Condor batch system is contacted directly by Condor-C on the CE –bypasses the BLAH layer Missing services provided by BLAH –in particular the log files used by the accounting system Need to develop BLAH plug-ins for Condor LRMS! Accounting from LRSM: –defined a standard structure of information based on GGF-UR specifications –defined a framework grid-specific information added by a plug-in dependent on the infrastructure LRMS-dependent layer is independent of the grid! –Collaboration with the Condor team for both the local Condor information and the interface to the grid system (OSG vs EGEE)
14
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 14 ICE-CREAM CREAM: lightweight web service Computing Element –SWDL interface Working in the GGF-BES groupto agree on a standard –C++ CLI allows direct submission –Fast notification of job status changes via CEMon –Improved security no “fork-scheduler” –Will support for bulk jobs on the CE optimization of staging of input sandboxes for jobs with shared files ICE: Interface to Cream Environment –being integrated in WMS for submissions to CREAM Plan to expose to users soon in a preview system –not yet plans for the release
15
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 15 BLAH CE ICE-Cream and Condor After match-making the Job controller submits to ICE or to Condor depending on the matching CE falvour Plan to deploy CREAM and Condor-C on the same server –show both interfaces to clients Condor is still the only way to submit DAGs (via DAGMan) NS WMProxy File list WM Helpers MM JA JC ICE Submitter Job Status Handler CREAMCEMonCondor-C
16
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 16 glexec on WN glexec is used by the CE middleware to change the identity of a process –was developed for the CREAM CE –based on the Apache HTTP suexec code base –uses the LCAS and LCMAPS for enforcement and mapping –library-based implementation in gLite 3.0 Several VOs submit ‘pilot’ jobs with (essentially) a single identity for all of the VO –e.g. using the Condor Glide-in –The ‘checkered’ placeholder then gets user jobs in ‘some’ way and executes them with the placeholder’s identity –The site does not ‘see’ the original submitter Allowing the VO pilot job to run glexec on the WN could ‘recover’ the user identity –needs development to make the credential acquisition process work across the network, so there can be a site-central policy engine –needs clear evaluation of the impact on security of trusting the VO pilot doing the correct operation –To support OSG: write LCMAPS plugin to GUMS and implement an interface to the GT4 WS AuthZ.
17
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 17 Current mode Job submission in the gLite-CE VO Scheduler: Condor- C or CREAM & BLAHP VO scheduler on head node changes to end- user’s identity (i.e. to the job owner in the VO job source) On change, site policies are checked Job on the batch queue has ‘proper’ identity Of course, also ‘classic’ submissions and proper uid changes by Condor-C&BLAHP on the head node Submitting user’s identity & job VO identity/process or VO placeholder manager Site managed and trusted services
18
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 18 VO scheduler on the node proper uid changes by Condor-C&BLAHP on the head node SHOULD REMAIN DEFAULT Job submission in a glexec-on-WN scenario VO scheduler submits a placeholder job to the batch system, and the VO ‘placeholder job’ submitter is responsible for the placeholder behaviour this might be a specific role in the VO, or a locally registered ‘badged’ user at each site The placeholder job is subject to the normal site policies for jobs The placeholder obtains the true user job, and presents the user credentials and the job (executable name) to the site to request a decision On success: the site will set the uid/gid of the new user’s job On failure: the glexec will return with an error, and the placeholder job can terminate or obtain another job
19
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 19 Summary Contributions from Condor team to EGEE effort –Through design team, prototyping, product (and ETICS) Condor link to OSG is very important to EGEE Grid middleware cannot be developed separately –Open communication channels –Effective exchange of ideas, requirements, solutions and technologies –Early detection of differences and disagreements Attempt to develop/modify components in a cooperative manner –eg. accounting, ICE/CREAM, glexec/LCMAPS
20
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 20 EGEE’06 Conference EGEE’06 – Capitalising on e-infrastructures –Demos –Related Projects –Industry –International community 25-29 September 2006 Geneva, Switzerland http://www.cern.ch/egee-intranet/conferences/EGEE06
21
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Claudio Grandi - Condor Week - Milano, 28 June 2006 21 www.glite.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.