WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 22 th July 2008.

Slides:



Advertisements
Similar presentations
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Advertisements

AMOD Report Doug Benjamin Duke University. Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
WLCG Service Report ~~~ WLCG Management Board, 18 th August
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals
GGUS summary (7 weeks) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1 To calculate the totals for this slide and copy/paste the usual graph please:
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
GGUS summary ( 4 weeks ) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
LCG Plans for Chrsitmas Shutdown John Gordon, STFC-RAL GDB December 10 th, 2008.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
CSCS Status Peter Kunszt Manager Swiss Grid Initiative CHIPP, 21 April, 2006.
WLCG Service Report ~~~ WLCG Management Board, 9 th August
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
WLCG Service Report ~~~ WLCG Management Board, 16 th December 2008.
LCG CCRC’08 Status WLCG Management Board November 27 th 2007
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE1102 ATLAS CMS LHCb Totals
WLCG Planning Issues GDB June Harry Renshall, Jamie Shiers.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
WLCG Service Report ~~~ WLCG Management Board, 7 th July 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE4015 ATLAS CMS LHCb Totals
4 March 2008CCRC'08 Feb run - preliminary WLCG report 1 CCRC’08 Feb Run Preliminary WLCG Report.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
DJ: WLCG CB – 25 January WLCG Overview Board Activities in the first year Full details (reports/overheads/minutes) are at:
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
WLCG Service Report ~~~ WLCG Management Board, 23 rd November
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 5 th August 2008.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons Maarten Litmaath On behalf of the WG participants GDB 09/09/2015.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 19 th August 2008.
WLCG IPv6 deployment strategy
Computing Operations Roadmap
WLCG Management Board, 16th July 2013
Olof Bärring LCG-LHCC Review, 22nd September 2008
1 VO User Team Alarm Total ALICE ATLAS CMS
Take the summary from the table on
Dirk Duellmann ~~~ WLCG Management Board, 27th July 2010
The LHCb Computing Data Challenge DC06
Presentation transcript:

WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 22 th July 2008

Introduction This ‘weekly’ report covers two weeks (MB summer schedule) Last week (7 to 12): Tuesday: MB F2F Wednesday: GDB, C-RSG Friday: OB This week(14 to 20): Monday: CMS CRUZET3 cosmic ray run finished Notes from the daily meetings can be found from: (Some additional info from CERN C5 reports & other sources) 2

C-RSG All reviewers have had one or more meetings with their experiments and are filling in a common (but adaptable) template leading to 2009 resource requirements. Executive summary per experiment: ALICE: Have had exchanges and a teleconference and ALICE have completed their template but follow up is needed. ATLAS: Only one reviewer was available. First iteration of template done but only partially completed. The second reviewer is now active. CMS: Template fully complete (matches immediately the CMS computing model). HI running is not being reviewed at this time (separately funded outside of CERN). LHCb: Full information given to enable template to be adapted/completed. The group notes they will have to renormalise the resulting experiment numbers to a common set of assumptions on the LHC running conditions. The planning is to report on the scrutiny of the validity of the 2009 resource requests in August. The CSO has agreed that these can already be made public though more detail may be added for the November C-RRB. In future years there may be a C-RRB in the summer to review the Scrutiny Group reports for the following year given the need to start hardware procurements well in advance of need. The group also had a report on the results of the Common Computing Readiness challenge at its fourth meeting. The group will meet in August to finalise the 2009 reports then decide the date for one or more Autumn meetings when they see how the LHC is performing bearing in mind that they have to finally report to the C-RRB meeting of the 11 November.

OB The OB heard an LCG project status report from I.Bird, a CCRC’08 post-mortem report from myself including a SWOT analysis and a report on procedural progress of the C-RSG from myself. The weaknesses are seen as: Some of the services – including but not limited to storage / data management – are still not sufficiently robust. Communication is still an issue / concern. This requires work / attention from everybody – it is not a one-way flow. Not all activities (e.g. reprocessing, chaotic end-user analysis) were fully demonstrated even in May, nor was there sufficient overlap between all experiments (and all activities). The main Threat perceived by the WLCG management is that of falling back from reliable service mode into “fire-fighting” at the first sign of serious problems. However, a consistent message is being given that experiments, sites and WLCG are ‘more or less’ ready for the expected 2008 data taking although constant attention will be needed at all levels.

Site Reports (1/2) CNAF: 10 July submitted post-mortem on recent power and network switch problems. Full services reported running by 19 July. BNL: 7 July primary link to TRIUMF failed due to outage in Seattle area and failover to secondary via CERN OPN did not come up. Workaround by turning off primary interface at BNL or TRIUMF but proper solution still being worked on. 9 July storage server network connection failure took some time to solve changing various components. Left some ATLAS files inaccessible. 14 July inaccessible file problem understood and put down to a problem introduced by dcache patch level 8. Files which for some reason failed to transfer out of BNL were pinned by dcache. SRM transfer first tries to pin files and gives up when it cannot. Other access methods work. Workaround is to periodically look for such pinned files and unpin them. No long term solution yet. Sites alerted but probably now being seen in IN2P3 after P8 upgrade.

Site Reports (2/2) FZK: 19 July at about 19:20 a major network router failed. Almost all services were affected. Some services were up again on Sunday but some are still degraded or unavailable (as of 13:00 Monday). In particular, some dCache pool nodes are not yet available. We are working on it and a post-mortem analysis will follow. General: 17 July GGUS conducted first service verification of the Tier 1 site operator alarm ticket procedure. Failures of the procedure at NDGF and CERN are understood and being fixed.

Experiment reports (1/3) LHCB: DC06 simulation is running smoothly under Dirac3 but reconstruction and stripping tests are still ongoing so there is no official date yet for the start of DC06. ALICE: Production hit by myproxy problems – see PM at Working on integration of CREAM-CE with ALIEN.

Experiment reports (2/3) CMS: CRUZET 3 cosmics run from 7 to 14 July. Quite good experience, more mature in terms of data handling in general. Reconstruction submissions to all Tier 1 ongoing. Preparing for next global cosmic exercise for the second half of August but expect cosmics data tests weekly Wed+Thur. Work finalized on the P5->CERN transfer system, a repacker replay is now running (since July 17th), namely redoing the repack for CRUZET-3 data. Plans: Next monday CMS will start more replays with some T0 real prompt reco testing. CMS would expect a centrally-triggered big transfer load of many CSA07 MC datasets to CMS T2's, as a needed step in order to complete the migration of the user analysis to T2 sites. Each T2 should expect to be asked to host a fraction of ~30 TB of those datasets. CMS have a CASTOR directory of 2.3 * 10**6 files of 160KB which are webcam dumps and have gone to tape. They are looking at deleting them and stopping fresh ones.

Experiment reports (3/3) ATLAS: CERN CASTORATLAS upgraded to on 14 July to avoid a fatal data size overflow problem. ATLAS taking cosmics with test triggers resulted in some very large datasets being (successfully) distributed to BNL. ATLAS now running cosmics at weekends and on 20 July ATLAS CERN site services stuck but resulting T0 to T1 catchup when services restarted Monday morning reached an impressive 2.5 GB/sec. Clearly more process monitoring alarms are needed. ATLAS workflow management bookeeping needs process level access to their elog instance (via an elog api call) and this is about to be made available after a security analysis.

Summary 10 Solid progress on many fronts