Download presentation
Presentation is loading. Please wait.
Published byTimothy Mason Modified over 9 years ago
1
Markus Schulz LCG Deployment WLCG Middleware Status Report 16 th February, 2009
2
Markus.Schulz@cern.ch 2 Overview The three WLCG middleware stacks ARC (NDGF) Most sites in northern Europe ~ 10 % of WLCG CPUs OSG Most North American sites > 25 % of WLCG CPUs gLite Used by the EGEE infrastructure Summary and Issues s have been added by me
3
ARC middleware status Michael Grønager Project Director, NDGF LCG-LHCC Mini Review CERN, Geneva, February 16 th 2009
4
LCG-LHCC Mini Review, CERN, February 2009 WLCG sites with ARC Tier-1: NDGF Tier-2s: Finnish Norwegian Slovenian Swedish Tier-3s: Danish Norwegian Swedish Swiss
5
LCG-LHCC Mini Review, CERN, February 2009 5 ARC Status – Current Version Current stable release 0.6.5 - “Earth Quake” (December) Improved cache scalability added ARC supports caching of files used by several jobs. This boosts performance for e.g. Analysis, but scalability issues were detected for large clusters. ARC0.6.5 enables to split this load to several file servers Optional patch for replacing Globus MDS by new solution: EGIIS, which includes BDII - This is deployed at most NDGF related sites Minor issue with LFC fixed
6
LCG-LHCC Mini Review, CERN, February 2009 6 ARC Status – Next Version Next stable release “Fastelavn”* (February) Further scalability improvements included: Support for sharing the load on multiple file system servers Support for distributing multiple up and down loaders on multiple machines - these new features makes ARC ready for running production on large +5000 core machines MDS fully replaced by EGIIS and BDII Optional publishing of GLUE1.3 along with ARC schema (Currently in testing e.g. at NDGF-BENEDICT-T3) KnowARC features stating to appear: Optional Module for OGF BES submission, based on new and more modular code base * aka Mardi Gras
7
LCG-LHCC Mini Review, CERN, February 2009 7 ARC Future The production release of ARC (sometimes called ARC classic) will continue to evolve More and more components will be integrated from e.g. the KnowARC project. The KnowARC development adds new service interfaces that adhere to standards like GLUE2, BES and JSDL These will be incorporated into the production rel. of ARC. There will be no “migrations” but a graduate incorporation of the novel components into the stable branch, like OGF BES in “Fastelavn” ARC components will be included in UMD, and ARC now supports building on ETICS.
8
Staus of OSG Middleware for WLCG Ruth Pordes, OSG Executive Director Alain Roy, OSG Software Coordinator LHCC MiniReview Feb 16 th 2009
9
OSG Middleware Scope & Status OSG provides packages for Compute Elements, Storage Elements, VO managers, Worker-Node Client and User Client. OSG middleware is tested to allow Applications to interoperate across OSG and EGEE (and NDGF). Thus WLCG users are able to transparently use the multiple grids. OSG V1.0 stable for during data taking, cosmic runs, ramped up simulation production and analysis during second half of 2008. 9 LHC Mini-Review, Feb 2009
10
Progress over last 6 months Bestman/xrootd Storage Elements now installed at several Tier-2. Bestman + nfs/luster/hadoop) installed on Tier-3s and a couple of Tier-2s. Addition of WLCG Client utilities (LFC, lcg_utils) enables use OSG Client with no need to install both the OSG and EGEE client packages. Roll-out of joint gLite/VO services/ GLobus common interfaces and protocols in security components. Significant testing effort across the projects including SCAS/LCAS, glexec, GUMS. EGEE packages continue to be included in OSG s/w stack: 10 VOMS/VOMS-Admin glexec edg-mkgridmap LHC Mini-Review, Feb 2009
11
Software Tools Group Part of new OSG project structure in FY09. Led by Alain Roy and Mine Altunay. Central hub for all software projects/plans. Aims to ensure stakeholder’s needs are met from planning to deployment. Single point of contact for software providers. Inputs: User/VO/Site requirements Software providers timelines/plans Outputs: Plans for software stack evolution Point of contact with the EGEE EMT and gLite. 11 LHC Mini-Review, Feb 209
12
External Software Provision OSG,US ATLAS,US CMS working closely with software development groups for Timely deployment of new versions of dCache and Bestman for WLCG needs. Evolution of the identity systems (looking at backends to Shib, Kerberos) and compatability. Condor changes to support scalability in number of jobs. Internet2/ESNEt for deployment of perfsonar network monitoring tools. Gratia accounting, OIM operations database & tools. Use of xrootd. OSG & US ATLAS working on generalization of PANDA for other users. OSG and US CMS working on generalization of Glide-in WMS for other users. 12 LHC Mini-Review, Feb 2009
13
OSG support for gLite underpinnings We continue to supply a subset of the VDT as RPMs: Condor Globus MyProxy GSI OpenSSH GPT 13 LHC Mini-Review, Feb 2009
14
Current Work Major focus is on better support for incremental upgrades, roll-back, forward compatability. Includes a redesign of the packaging to improve native packaging Debian 5 support for LIGO Software upgrades only if really needed. Not looking yet at Globus 4.2 Interoperability: Testing of compatability of CREAM with OSG Client stack Ensure availablity, reliability, installed capacity, accounting software and sevices all report correctly from OSG to EGEE and to WLCG. 14 LHC Mini-Review, Feb 2009
15
Currently Supported Platforms Linux (32 & 64 bit) RHEL 3 RHEL 4 RHEL 5 Debian 4 ROCKS 3 SuSE Linux 9 (just 64-bit) Scientific Linux 3 Scientific Linux 4 Mac OS 10.4 (client only) AIX 5.3 (limited support) 15 LHC Mini-Review, Feb 2009
16
Concerns (nothing new) Need to continue to ensure modularity/separation of EGEE services and WLCG, to enable OSG to effectively contribute and peer. Need WLCG to work with OSG middleware activities as closely as with the EGEE middleware activities. We are all trying hard here! Interoperability activities will become more challenging in an EGI era where the number of independent s/w stacks may grow or diverge. OSG committed to work with EGI partners in these areas. OSG pleased to contribute to the Infrastructure Policy Group. These are pragmatic activities for understanding commonalities and differences. OSG remains nervous at the potential of OGF standards being really successful. 16 LHC Mini-Review, Feb 2009
17
Markus.Schulz@cern.ch 17 gLite The current release is gLite 3.1 It is updated almost every week ( 30+ updates/year) Its purpose is to provide a stable platform for production grid usage It covers: Data Management Workload Management Information System AAA Distributed lifecycle Tools and formal processes Links teams and tasks Monitor progress Large code base (~1.6 Million lines of code)
18
Markus.Schulz@cern.ch 18 Most Active Areas Workload management ( access to computing resources) Support for multiuser pilot jobs Used by experiment’s frameworks: Dirac, Panda, ALIEN Move to next OS platform: SL5 Continuous evolution of other components FTS, DPM, LFC……..
19
Markus.Schulz@cern.ch 19 Workload management LCG-RB has been phased out WMS-3.1 SL4 major update (accumulates patches from > 8 months) Certified Will be released to production in the next weeks Can handle >30K jobs/day Better support for bulk submission Almost ready to support CREAM-CE ICE integrated, but needs more testing Support for multiuser pilot jobs SCAS and glexec on WNs are late Now under stress testing Still issues with memory management Fails at 0.03% rate Not good enough for an authorization system Scales to > 10 Hz ( ok for most sites) Will start a pilot service during the next week
20
Markus.Schulz@cern.ch 20 Computing Resource Access (CE) In production at all EGEE sites: LCG-CE Legacy service Introduced end 2002 Has been improved over the years to handle 50 users and 4K jobs This is good enough for production use Might be problematic for analysis tasks CREAM-CE New architecture Web Service interface, supports BES standard Parameter passing to batch systems Scalability!!! First version has been released to production 8 months ago 13 instances in production + 13 in PPS Used by ALICE New version with many bug fixes in final certification state
21
Markus.Schulz@cern.ch 21 Scientific Linux 5 SL5 Worker Node pilot phase has come to an end Experiments encountered no major problems New formal release is being prepared Will arrive in production soon Other activities: Multi compiler support Support for multiple versions Improved rollback support Long term: Support of new information system schema ( GLUE-2) Introduction of first components of new EGEE Authorization Framework Policy management system
22
Markus.Schulz@cern.ch 22 Issues and Outlook EGEE-III ends early 2010 The new environment for middleware support is under discussion Less CERN involvement in integration and release management Will the new entities be up and running in time? gLite Consortium Discussions on formal agreement are taking place Required to organize support for gLite middleware Unified Middleware Distribution is forming ARC + gLite + UNICORE Move towards standards based middleware WLCG has a wider scope Maintaining interoperability might become more difficult
23
Markus.Schulz@cern.ch 23 Summary All 3 middleware stacks provide stable production environments And are aware of scalability issues and addressed most of them All 3 stacks interoperate with each other And work on improving interoperability and interoperation OSG supports actively supports pilot jobs (glexec/Gums) gLite will soon ( glexec/SCAS) Middleware stacks still evolve successfully introduced major changes to the production system Without interrupting the service The transition from EGEE-III to EGI, UMD and the gLite consortium will be challenging
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.