Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Middleware reengineering.

Similar presentations

Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Middleware reengineering."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Middleware reengineering Claudio Grandi – JRA1 Activity Manager - INFN EGEE-II 1 st EU Review (CERN) 15-16 May 2007

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 2 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 2 Outline Activity goals and organization Consolidation of gLite 3.0 - New software process Preparation of gLite 3.1 - Migration to ETICS Highlights Standardization Relations to industry and other projects JRA1 All-Hands meeting series Future plans

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 3 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 3 JRA1 activity goals Continue to support and evolve the gLite open source implementation of application-independent grid middleware –Application-independent foundation services  Deployed at all sites connected to the infrastructure  Partly based on common Grid tools such as Condor and Globus (from the Virtual Data Toolkit, VDT) –Set of higher-level services working on top of the foundation  Deployed on-demand at specific sites –Follow a Service Oriented Architecture  Mostly based on web-services. Aim to comply with WS-I specifications Activity targeted to the support of the Production System (PS) –Gradually deploy new components on the PS, support and maintain them  Prompt fixing of bugs and support to the Global Grid User Support (GGUS)  Stability, scalability, manageability  Work in Technical Coordination Group Task Forces –Further evolve the middleware stack  Facilitating interoperability with other infrastructures  Addressing user needs  Attention to emerging standards

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 4 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 4 JRA1 in Numbers EGEE-II Budget Manpower: 11 partners, 9 countries, 51.5 FTE

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 5 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 5 JRA1 Groups Security (J. White, UH.HIP) INFN, SWITCH, UH.HIP, FOM, UvA, UiB, KTH Resource Access, Accounting, and Brokering (F.Giacomini, INFN) INFN, Datamat S.p.A. Logging, bookkeeping, and provenance (A.Krenek, CESNET) CESNET Data Management (G.McCance, CERN) CERN Information and Monitoring (S.Fisher, CCLRC) CCLRC US Univ. Chicago and Univ. Southern California (GLOBUS), Wisconsin Madison Univ. (VDT and Condor)

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 6 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 6 gLite 3.0 release Convergence of LCG 2.7.0 and gLite 1.5.0 in spring 2006 and release of gLite 3.0 –Continuity on the production infrastructure ensured usability by applications –Initial focus on the new Job Management components  Thorough testing and optimization together with the applications Reorganization of the work according to the new process –EGEE Technical Coordination Group (TCG) and Task Forces –EGEE SA3 Activity for integration and certification – Engineering Management Team (EMT) for release coordination –“Continuous release process”  No big-bang releases! LCG-2 prototyping product 2004 2005 product gLite 2006 gLite 3.0

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 7 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 7 TCG and EMT The EGEE Technical Coordination Group (TCG) defines the priorities for middleware development and certification –Members from LHC experiments and other EGEE-NA4 applications, and form EGEE Technical activities –Collects requirements from the applications  Started from the applications requirement list  Recently added security and sites requests –Prioritizes the requirements –Approves the JRA1 and SA3 work plans The Engineering Management Team (EMT) coordinates the production of gLite releases –Members from SA3, JRA1, SA1 and VDT –Decides on the release schedule of patches and components –Follows critical bugs individually –Works according to TCG directives

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 8 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 8 Bugs and patches New software process formally adopted in June 2006 4889 bugs or feature requests have been submitted since the beginning of gLite (3 years) –including gLite 3.1 build failures –1198 are still being investigated or have been postponed 237 patches have been provided for gLite 3.0 and 31 for gLite 3.1 –109 patches for gLite 3.0 are deployed in production Data collected on 2/5/07

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 9 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 9 Experimental services Performance of some critical components needed improvements –Stability of the Workload Management System (WMS) and the Logging & Bookkeeping (LB) –Stability and performance of the gLite Computing Element (gLiteCE) The normal certification process could not help –Needed testing at the production scale Introduced the Experimental Services –Instances of the services attached to the production infrastructure –Maintained by SA1 and SA3 –JRA1 patches are installed immediately (before the certification) –Testing done by selected application users –Process controlled by the EMT Rapid improvement of the components –WMS and LB now ready for production (see later) –gLiteCE improving slowly but constantly –Next will be the new CREAM Computing Element

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 10 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 10 Preview testbed New components put in stand-by due to the prioritization of the certification and deployment activities –Difficult to improve these components if not used by real users Preview test-bed created by JRA1 to expose to users those components not yet considered for certification –To get feedback from users and site managers –Resources provided by JRA1 partners also in the SA1 activity without compromising their commitment in SA1 Slow start –Problems identifying resources –Best-effort support A few tests by the users –But also experimental services started Components available: –CREAM Computing Element –Job Provenance (JP) System –G-PBox Policy management –glexec for pilot jobs support CNAF:UI, gLiteCE+1 WN, BDII, DPM-SE, WMS (+GPBOX tests) Padova: 1 CREAM-CE +4 WNs (1 WN is also gLiteCE) Prague: LB with JP, 1 gLiteCE+1 WN RAL: 1 R-GMA registry Helsinki: 1 gLiteCE + 2 WN with glexec on WNs NIKHEF: will have 1 CREAM CE + ? WNs with glexec on WNs

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 11 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 11 gLite 3.1 release The gLite 3.1 release is mainly intended to support new platforms widely deployed on the infrastructure –In particular Scientific Linux 4 (SL4) and 64-bit architectures –Implied the migration to a new version of the Virtual Data Toolkit (VDT), including Globus Toolkit 4 (GT4) At the same time we adopted the new build system produced by the ETICS project –More flexible, in particular in addressing multiple platform support  It would be impossible using the old gLite build system –Includes an integrated testing framework Looks like a “big-bang” release, but it is not! –Only a small number of new functionalities –gLite 3.1 components will be released to production at different times, according to their certification  Worker Node being released, User Interface coming in a few weeks full release foreseen to be completed in a couple of months –Backward compatibility is provided, also to allow coexistence of sites with old and new versions of the software

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 12 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 12 gLite restructuring Requirements on gLite are increasing due to: –An increased number of sites, with different levels of gLite expertise –An increased number of applications, with diverse needs –A demand for more supported platforms Long term maintenance of gLite needs to be assured –Standard and/or commercial solutions not widely available, yet –Support of legacy components and dependencies on external packages made the gLite stack grow too complex In January 2007 the project decided it was necessary to invest effort to address the long term sustainability of the middleware –Cleanup of code-base and dependencies for gLite 3.1 is now being pursued with high priority at expense of adding new functionality –Adoption of the build tools from the ETICS project will provide a better maintainability of the stack in the long term  This is already pursued as part of the gLite 3.1 release process The gLite restructuring process coexists with the current activities needed to support the applications on the production infrastructure

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 13 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 13 Interoperability with Shibboleth Shibboleth is software that implements a federation of campus infrastructures Developed by Internet2 Allows Single Sign On for web- based resources Based on SAML (Security Assertion Markup Language) Interoperability with gLite (new in EGEE-II) –Specific for EGEE-II infrastructure  No replacement for X.509 certificates, etc... –Home institution of the user is the Identity Provider (IdP) –Attributes both from home institution and the VO

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 14 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 14 Data Management Migration to version 2.2 of the Storage Resource Manager (SRM) interface of the gLite Storage Element (DPM) and client tools –Support for e.g. storage classes, storage space management including space reservation, access control lists, pre-staging and file-pinning –Interoperability has been difficult to achieve but several back-ends exist:  Castor, Enstore, HPSS, dCache (TSM), Storm (GPFS), DPM, DRM/BestMan –97 instances of DPM deployed on the infrastructure supporting 135 VOs Stability and scalability improvements on the File Transfer System (FTS) and the LCG File Catalogue (LFC) –Sustained and reliable data export rates of over 1 GB/s with FTS –110 instances of the LFC catalogue deployed on the infrastructure Modularity: catalogs and transfer protocols supported as plug-in Review of file catalog: LFC chosen; Fireman supported until needed Work progressing on the Encrypted Data Storage (EDS): –Migrating from the legacy Fireman+gLiteIO to the new LFC+GFAL system –Incorporating the Shamir Secret Sharing System in the Hydra key-store for key-splitting support Storage Accounting –External tools from SA1, LCG and National Grid Initiatives

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 15 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 15 Computing Element BLAH (Batch Local ASCII Helper) is the interface to the local resource manager of the gLiteCE and CREAM –Entry point for EGEE information, accounting and authorization (glexec) In production now:  LCG-CE (GT2 GRAM)  Not ported to GT4  Robust, about 14 jobs/min Coming:  gLite-CE (GT4 GRAM + Condor-C)  Deployed (GT2 version) but still needs thorough testing and tuning  Now close to requested robustness, about 18 jobs/min  CREAM (WS-I based interface)  Now on the preview test-bed  OGF-BES compliant (demo @SC06)  Almost 50 jobs/min Possible deployment scenarios –GT4 → BLAH submissions? WMS, other clients Batch system Information Providers bdII, R-GMA CEMon Authorization glexec + LCAS/LCMAPS BLAH Information System Accounting APEL/DGAS Accounting Monitoring gLiteCE CREAM GT4 pre-WS jobmanager GT4 WS

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 16 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 16 Workload Management Workload Management System –New features available since first gLite 3.0 release but service was not stable  Web Service interface, proxy renewal, compound jobs handling, automatic resubmission, etc... –Experimental service now at production quality. Acceptance test:  run one week at 15000 jobs/day without manual intervention with 0.3% of jobs in non-final state Logging and Bookkeeping –Recent modifications to support non-DAG collections –Tests show scalability up to 1 Million jobs/day New WMS and L&B going to production now 27000 jobs/day Load-limiter prevented submission Small number of jobs not assigned to CEs } Job in WMS } Job on CE Final states

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 17 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 17 Other development Attribute-based authorization: Virtual Organization Membership System (VOMS) and VOMS-Admin –Added support of generic attributes Support for pilot jobs –glexec and LCAS/LCMAPS modified –Interoperate with Open Science Grid (OSG) Distributed Grid Accounting System (DGAS) –Deployed on INFNGrid and tested for robustness and scalability Relational-Grid Monitoring Architecture (R-GMA) –Re-factoring to implement the new R-GMA design –More robust (about 3000 producers active on the PS) Service Discovery (SD) –Input to OGF-SAGA

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 18 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 18 Standardization activities Grid Interoperability Now at OGF –pragmatic approach: balance between application needs vs. technology push Standards adoption –Security:  Supporting international standards (mainly from OGSA-Auth) either directly or through plug-ins  Interoperability with Shibboleth –Data Management:  Standard interfaces for SE (SRM 2.2)  Bulk data transfer (OGF-OGSA-DMI) –Workload Management:  SOAP, WSDL and WS-I used to access services; building JDL to JSDL translator  OGSA-BES and JSDL demo at SC’06 for CREAM and WMProxy –Resource description:  GLUE-WG started at OGF; information exchange between GLUE and CIM under study –Information systems:  Specification defined within OGF- INFOD –Accounting  gLite extends OGF-UR specifications; work on OGF-RUS –Networking  NPM compliant with GGF-NM –Applications:  OGF-SAGA APIs Input to proposed standards –VOMS Attribute Certificate –CREAM contribution to OGF-BES –FTS contribution to OGF-DMI GINGIN

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 19 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 19 Relations to industry JRA1 participates in the EGEE-II Industry Task Force –presentations of gLite at the EGEE Industry Day events and other industry related events Security requirements of the French Centre National d'Etudes Spatiales (CNES) –Positive assessment of the gLite security model after in depth discussions.  Will address the open points during the second year of the project Collaboration with the CERN Openlab –Use of market-based scheduling system for computing resources with HP representatives Activities with Platform just started –Optimization of the use of the LSF batch system in EGEE sites –Improve the communication mechanisms between Grid clients and batch systems

20 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 20 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 20 Community-building events for a group distributed over different countries Good chance to interact with related activities and projects SA3, ETICS, but also GSVG, GILDA, MWSG JRA1 All-Hands meetings Regular JRA1 plenary meetings Two days, about 3 times/year University of West Bohemia Pilsen, CZ Cosener’s House, Abingdon, UK University of Catania Catania, IT HIP / CSC, Helsinki, FI July 2006 November 2006 March 2007 June 2007

21 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 21 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 21 Plans Provide support to the production infrastructure –bug fixing, GGUS support, TCG Task Forces (about 50% of total effort) Complete the release of the gLite 3.1 components –Including full adoption of ETICS, support for SL4, migration to VDT 1.6.0  Foreseen to be completed by Q2’07 –Support for 64-bit and other platforms will follow according to TCG decisions  64-bit support on User Interface and Worker Node is foreseen by Q3’07 Complete the gLite restructuring activity together with SA3  foreseen to be completed by Q4’07 Use experimental services to improve the certification process of new components  gLiteCE foreseen to be ready by Q2’07, CREAM by Q4’07 Address service manageability issues from site managers –common format for log files of services –hooks for local/global monitoring –improved service management interface –standardized error messages  Continuous activity. First improvements in the gLite 3.1 components Support the preview test-bed

22 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 22 EGEE-II INFSO-RI-031688 JRA1 - Claudio Grandi - EGEE-II 1st EU Review - 15-16 May 2007 22 Summary The gLite 3 middleware stack is in production –Addressing stability and performance problems as they appear –New software process adopted –New components and functionalities added as needed but without affecting the stability of the infrastructure –The introduction of a preview test-bed and of experimental services optimized the process to get new components in production We are addressing the long term sustainability of gLite –New ETICS build tools adopted for the gLite 3.1 components –Working on support of SL4 and 64-bit –Now restructuring the gLite 3 code-base to improve its maintainability and portability Addressing the needs of the applications while improving the adherence to international standards –Correct balance controlled by the Technical Coordination Group

Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Middleware reengineering."

Similar presentations

Ads by Google