TCG Discussion on CE Strategy & SL4 Move

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Advertisements

LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Deployment GridPP 18, Glasgow, 21 st March 2007 Tony Cass Leader, Fabric Infrastructure.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
Glite/ARC interoperability Christian Ulrik Søttrup EGEE-II/(KnowARC)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Middleware Deployment and Support in EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks JRA1 summary Claudio Grandi EGEE-II JRA1.
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
INFSO-RI Enabling Grids for E-sciencE Strategy for gLite multi-platform support Author:Eamonn Kenny Meeting:SA3 All Hands Meeting.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
LCG EGEE is a project funded by the European Union under contract IST LCG PEB, 7 th June 2004 Prototype Middleware Status Update Frédéric Hemmer.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Batch System Integration Update Jan Just.
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite and Condor present and future Claudio Grandi (INFN – Bologna)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA3 partner collaboration tasks & process.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.
Configuration & Management for Joachim Flammer Integration Team EGEE is a project funded by the European Union under contract IST JRA1 all-hands-meeting,
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROCs Top 5 Middleware Issues Daniele Cesini,
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
EGEE-II INFSO-RI Enabling Grids for E-sciencE Status of INFN middleware in gLite Claudio Grandi INFNGrid EB CNAF,
INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.
Status of the SL5 migration ALICE TF Meeting
Resource access in the EGEE project Massimo Sgaravatto INFN Padova
OpenSSL and Java 7 vs. 512-bit proxy keys
gLite: status and perspectives
Claudio Grandi – JRA1 Activity Manager INFN and CERN
The EMT Oliver Keeble, SA3 CERN.
LCG Service Challenge: Planning and Milestones
Tamas Kiss University Of Westminster
Claudio Grandi - JRA1 Activity Manager - INFN
The gLite middleware distribution
Andreas Unterkircher CERN Grid Deployment
Summary on PPS-pilot activity on CREAM CE
Claudio Grandi (INFN and CERN)
Distributed Job Submission in a Dynamic Virtual Environment
gLite Middleware Status
Slides contributed by EGEE Team
Update on Plan for KISTI-GSDC
The CREAM CE: When can the LCG-CE be replaced?
Introduction to Grid Technology
SRM2 Migration Strategy
Testing for patch certification
Olof Bärring LCG-LHCC Review, 22nd September 2008
glexec/SCAS pilot service
Short update on the latest gLite status
ETICS Services Management
DPM releases and platforms status
Cristina del Cano Novales STFC - RAL
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
How to Successfully Implement an Agile Project
Pierre Girard ATLAS Visit
WLCG Collaboration Workshop: Outlook for 2009 – 2010
UMD 2 Decommissioning Status
UMD 2 Decommissioning Status
Transfer to Operations
gLite The EGEE Middleware Distribution
Presentation transcript:

TCG Discussion on CE Strategy & SL4 Move Discussed last Friday at the TCG

Overview Material shown at the TCG meeting With some comments Summary of the result of the discussion

Introduction Some terminology Component releases rather than big-bangs 3.0 means built on SL3, VDT1.2.x (GT2) gcc-3.2 Build system: gLite (and old lcg) 3.1 means built on SL4, VDT1.3.x (GT4) gcc-3.4, Java-1.5 and ETICS build system Irrespective of actual code version!!! Exceptions will be WMS/CE which is refactored code for 3.1 There is insufficient effort to build, test, certify and run every possible/desirable combination   3.0: SL3, VDT1.2/GT2 3.1: SL4, VDT1.3/GT4 lcg CE yes Runs on SL3 node or run SL3 binary on SL4 (not built with GT4 or VDT1.3) gLite CE

Status LCG-CE: gLite-CE: CREAM: WS-GRAM: Production CE, many improvements in last 4 years to make suitable for an operational environment gLite-CE: Not really tested at all seriously yet – all effort has been on the WMS Based on WMS experience CE testing will take at least 3 months And is more complex since need several instances who all react rapidly to patches etc. Not so easy to rapidly cycle fixes. CREAM: Unknown status – therefore not on the horizon for production use for at least 1 year (optimistic(!) based on past experience) WS-GRAM: Unknown status – OSG will do testing of it – timescale unknown

CE Universe – who can talk to what? Condor strategy is to connect to any CE Why do we need to reproduce that? – Why not just stay with Condor-G? WMS Condor-G GRAM ? Condor-G ICE WS-GRAM LCG-CE Condor_C CREAM blah LRMS

Other problems LCG-RB vs gLite WMS: VO Boxes: DGAS: 3.0 code base: 3 months effort to make useable for applications; BUT still many outstanding operational issues: Sandbox management – solution of LCG-RB has not been implemented Too many processes – memory use – frequent restarts Filling of disk partitions; need for log files to be pruned by hand frequently Gang-matching breaks WMS It is NOT REALLY ready for production use! 3.1 code base: Much has changed vs 3.0! First tests show 30% failures! Will it require an additional 3 months of testing??? VO Boxes: Based on LCG-CE; functions must be ported to SL4 (and repckaged) Experiment VOBox sw must be ported also DGAS: DGAS is added to LCG-CE since we have not gLite CE yet … requires 2 versions of DGAS - SL4/GT4 + version on SL3 for lcg-ce

Proposed Strategy Strategy for CE: (for the foreseeable future) CE must allow the application interface based on Condor-G: Condor is integral part of interoperation strategy, must keep some constant for apps (e.g. like that in NL) Phase out LCG-CE by July 2007 no more globus-job-run or GRAM Sites that support VOs that need GRAM-like access need to maintain LCG-CE and gLite-CE together while apps migrate to use Condor-G submission We will only support LCG-CE until end June.   No effort goes into LCG-CE Will be frozen, all new effort goes into gLite-CE (functionality) What to do with LCG-RB? Handling jobs - WMS 3.0 is on same level as LCG-RB But: cannot remove LCG-RB until outstanding management issues resolved End-of-life: April 2007? Will not be migrated to SL4

*) The code version depends on test results Independently when they Timeline Jan Dec Nov Feb Mar Apr May Jun UI/WN SL4&VDT1.3 WMS * gLite-CE * LCG-CE support ends VO Box Fall Back SL3 code on VDT1.2 on SL4 SL3 code VDT1.2&SL4 *) The code version depends on test results Build with ETICS FTS, DPM, LFC,…… Move to SL4&VDT1.3 Independently when they are ready

TCG’s Strategy Overall the proposed plan was accepted The LCG-CE will be supported at least until end of June Minor adjustments for accounting, but otherwise ‘stable’ Further steps depned on the quality of the gLite-CE Sites can add the GT4-pre-webservice-gram standard job managers to the gLite-CE We will not provide certification for this at the moment Future decisions depend on several factors Applications, LCG-CE’s future, etc… Sites who are interested in this should become part of the PPS SA3 will be initially not directly involved in this. Practical aspects will be discussed between NIKHEF SA3 and TCG This will become only relevant towards the end of life of the LCG-CE CREAM-CE JRA1 should investigate to add GT4 WebServices Gram JRA1 should continue to work with OGF on standard interfaces Policy: EGEE should support multiple interfaces on the CE As far as required by apps and feasible

Update on SL4 ETICS build is in progress Alberto is helping with the transfer Discussed right now at the JRA1 all hands No tests yet with native ports Until they start any time estimation is very imprecise SL3 WNs running on SL4 are ready and going to PPS (today or tomorrow) Additional nodes will follow as needed (hopefully this is not needed)