Download presentation
Presentation is loading. Please wait.
1
The CREAM CE: When can the LCG-CE be replaced?
Nick Thackray CERN WLCG Grid Deployment Board, CERN 12 November 2008
2
Current status in production grid
CREAM CE in the production middleware stack since 15th October, however: The production version of the WMS cannot successfully submit jobs to the CREAM CE But it could match to it black hole for jobs special YAIM configuration needed (available in production) Proxy renewal relies on 1000s of ports on the WN to be opened to incoming IP traffic – not exactly popular with the sites Nevertheless, Alice can use it… and there have been several requests for sites (particularly Alice tier-1 sites) to install a CREAM CE - only KIT have done it so far This is an ideal opportunity for both Alice and sites to get experience with the new CE WLCG Grid Deployment Board, CERN 12 November 2008
3
CE replacement kick-off criteria (1)
List of suggested criteria to be met before transition plan goes ahead: The CREAM CE should provide equivalent functionality and performance to the LCG CE – one exception: no ability for users to fork processes on the CE! (showstopper??) The sum of all CREAM CE production instances must have processed a large number of jobs (>1M?) and shown no major problems for several (3?) months The following are taken from EGEE TMB document* specifying criteria for CREAM CE (August 2007). As the CREAM CE code has been significantly re-written since then, should we run the tests again? Performance 5000 simultaneous jobs per CREAM CE node 50 user/role/submission node combinations supported on a single CREAM CE node Reliability Job failure rates in normal operations due to the CREAM CE <0.5% Job failures due to restart of CREAM CE services or reboot <0.5% 5 days unattended running with performance on day 5 equal to that on day 1 Batch systems integrated by default LSF PBS-Torque/Maui Sun Grid Engine Condor The process for integrating the CREAM CE with other batch systems must be fully documented. WLCG Grid Deployment Board, CERN 12 November 2008 *
4
CE replacement kick-off criteria (2)
WMS-ICE submission to CREAM is available in production and no significant bugs are outstanding. ICE / CREAM job submission chain should be able to meet the above performance criteria and otherwise perform at least as well as WMS / LCG CE (current PPS pilot testing is showing problems here). Condor-G submission to CREAM is available in production and no significant bugs are outstanding. [Condor team say several weeks for this to be ready] Information publishing by CREAM is accurate and complete (?) (for example, sub-clusters, etc.). MPI works on CREAM. There is a clear plan, with agreed timelines for implementation, for migration of CREAM away from gJAF (to LCMAPS?). A support plan is in place to help VOs to migrate to CREAM (part of transition plan). The CREAM CE can be configured using a version of YAIM that is in production, with no workarounds needed. Monitoring probes are available for the CREAM CE. This set of tests must provide at least the same coverage as the current set for the LCG CE. The parameters passing mechanism (to BLAH) must work for submission through the WMS (and also for direct job submission ?). [Is this a showstopper?] WLCG Grid Deployment Board, CERN 12 November 2008
5
LCG-CE CREAM transition plan
Work in progress, but will include: Phased approach – no big bangs Large sites running both LCG CE and CREAM CE in parallel for some period Transition for tier-1 sites will be carefully managed (plans requested, tracked at weekly grid operations meeting, etc.) Service retirement procedure will be followed for LCG-CE, giving clear timelines for suspension of support, suspension of critical bug fixing and final obsoletion of service. WLCG Grid Deployment Board, CERN 12 November 2008
6
What about LHC start-up…?
All very well and good, but would the experiments want to transition to a new CE after LHC start-up next year? If not, then this really means that WMS-ICE / CREAM CE combo must be ready and in place in time for CCRC ’09 Would it need to be just at the tier-1s? Larger tier-2s as well?? All sites??? Watch this space……… WLCG Grid Deployment Board, CERN 12 November 2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.