Download presentation
Presentation is loading. Please wait.
Published byNickolas Henry Modified over 9 years ago
1
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007
2
October 7, 2005 2 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Deployment History gLite-3.0 was delivered in May 2006 Rapidly deployed to Tier 1 sites After 1 st update (3.0.1) full deployment across EGEE Two full update releases 3.0.1, 3.0.2 delivered in June, August Change to incremental updates (move away from big-bang releases) 12 updates to 3.0.2 – rapidly deployed by all sites Anticipate major releases only for major changes: e.g. 3.1.0 will be SLC4 But even then will avoid major functional or behavioural changes
3
October 7, 2005 3 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Comments No real distinction between what was LCG-2.7 and gLite-3.0 Mainly evolved versions of existing services Some “new” services – they existed before but are now in use (e.g. VOMS) The introduction of gLite-3.0 was not disruptive to the production service Although it did cost effort from the sites! Most gLite-3.0 services are deployed only in EGEE; exceptions are: FTS deployed also in US Tier 1s and NDGF WMS/LB used by CMS to submit work across EGEE and OSG VOMS
4
October 7, 2005 4 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status of major components VOMS: VOMS service in full production; old ldap-based VO services stopped VOMS roles and groups: FTS already supports roles and groups DPM supports roles, groups and ACLs dCache 1.7 supports roles, groups at disk-pool level; ACLs mid-year Castor – no real estimate yet Job priorities: VOViews and batch system support being tested now; supported by gLite WMS R-GMA: Used as back-end of APEL (accounting) Used as monitoring transport mechanism … and hence used by dashboards
5
October 7, 2005 5 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status – 2 FTS: Used by all experiments; deployed at Tier 0 and all Tier 1s Rapid cycle of fixes for issues found in Service Challenges and ongoing use Most major issues (e.g. fat clients) have been addressed Version 2.0 support for SRM2.2 LFC: In production, used by ATLAS, LHCb Deployed as both central and local file catalogues Major issues addressed: python API problems Bulk queries (can now achieve 300 Hz) GFAL/lcg-utils: main SRM clients Used by all experiments Updated to support SRM v2.2
6
October 7, 2005 6 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status – 3 WMS/LB: ATLAS and CMS rely on gLite WMS functionality – particularly the job collections CMS can still use LCG-RB for MC production; not an option for ATLAS LHCb and ALICE use LCG-RB or gLite WMS as basic submission tool Major testing effort with CMS (and ATLAS participation) in Q306 to get WMS to state to be used in CSA06 Testing showed that rates of ~26k jobs/day are feasible from a single node in quiet conditions: Now see that extended testing shows memory consumption limits to ~10k jobs/day In CSA06 CMS achieved workloads of 5-8k jobs/day on each of 2 WMS nodes, but limited by bottlenecks in CMS components (later fixed) In use by ATLAS MC production since July at rates up to 4k jobs/day on single WMS node Major issues now are: Reliability of service is not adequate for service managers or production managers Starting second phase of testing now, with full ATLAS & CMS participation ATLAS more sensitive to reliability issues in their production Aim is reliable (equiv to LCG RB stability) operation with 50K jobs/day Q207 and 100-150K jobs/day on <10 nodes by end of 2007
7
October 7, 2005 7 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status – 4 CE: Not widely deployed, but testing has shown it to be now reasonably reliable There were a number of problems initially BUT: Condor bug limited #jobs in a batch system to 100 (!) Now have a fix for this, serious scale testing is starting Phasing out the LCG CE is not so urgent: Experiments want stability gLite CE brings limited additional functionality (pass of job resource requirements to batch system) Need to avoid porting LCG-CE to SLC4 if possible
8
October 7, 2005 8 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Summary gLite-3.0 is the production middleware on EGEE Some services used elsewhere also All services are in production use Testing efforts for WMS/LB and CE still needed to get to desired performance and reliability levels
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.