Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Slides:



Advertisements
Similar presentations
London Tier2 Status O.van der Aa. Slide 2 LT 2 21/03/2007 London Tier2 Status Current Resource Status 7 GOC Sites using sge, pbs, pbspro –UCL: Central,
Advertisements

Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008.
NorthGrid status Alessandra Forti Gridpp15 RAL, 11 th January 2006.
Deployment metrics and planning (aka Potentially the most boring talk this week) GridPP16 Jeremy Coles 27 th June 2006.
Applications Area Issues RWL Jones GridPP13 – 5 th June 2005.
Northgrid Status Alessandra Forti Gridpp22 UCL 2 April 2009.
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
LondonGrid Site Status and Resilience Issues Duncan Rand GridPP22.
Your university or experiment logo here BaBar Status Report Chris Brew GridPP16 QMUL 28/06/2006.
RAL Tier1: 2001 to 2011 James Thorne GridPP th August 2007.
User Board - Supporting Other Experiments Stephen Burke, RAL pp Glenn Patrick.
LondonGrid Status Duncan Rand. Slide 2 GridPP 21 Swansea LondonGrid Status LondonGrid Five Universities with seven GOC sites –Brunel University –Imperial.
SouthGrid Status Pete Gronbech: 12 th March 2008 GridPP 20 Dublin.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Northgrid Status Alessandra Forti Gridpp24 RHUL 15 April 2010.
NorthGrid status Alessandra Forti Gridpp12 Brunel, 1 February 2005.
Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Wahid Bhimji Andy Washbrook And others including ECDF systems team Not a comprehensive update but what ever occurred to me yesterday.
NorthGrid status Alessandra Forti Gridpp13 Durham, 4 July 2005.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling.
Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
London Tier 2 Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.
VOMS Alessandra Forti HEP Sysman meeting April 2005.
Northgrid Alessandra Forti M. Doidge, S. Jones, A. McNab, E. Korolkova Gridpp26 Brighton 30 April 2011.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Project Management Sarah Pearce 3 September GridPP21.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
Testing the UK Tier 2 Data Storage and Transfer Infrastructure C. Brew (RAL) Y. Coppens (Birmingham), G. Cowen (Edinburgh) & J. Ferguson (Glasgow) 9-13.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Report on Installed Resource Capacity Flavia Donno CERN/IT-GS WLCG GDB, CERN 10 December 2008.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
BaBar Cluster Had been unstable mainly because of failing disks Very few (
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
CERN - IT Department CH-1211 Genève 23 Switzerland Operations procedures CERN Site Report Grid operations workshop Stockholm 13 June 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
The VOMS and the SE in Tier2 Presenter: Sergey Dolgobrodov HEP Meeting Manchester, January 2009.
VOMS chapter 1&1/2 Alessandra Forti Sergey Dolgodobrov HEP Sysman meeting 5 December 2005.
EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.
II EGEE conference Den Haag November, ROC-CIC status in Italy
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
The Beijing Tier 2: status and plans
Andreas Unterkircher CERN Grid Deployment
Update on Plan for KISTI-GSDC
Proposal for obtaining installed capacity
WLCG Management Board, 16th July 2013
The CCIN2P3 and its role in EGEE/LCG
Presentation transcript:

Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008

Layout General status General news Site news VOMS and sysadmin repos Conclusions

General Status (1) 96% DPMyes SL4Glite3.1 Sheffiel d 93% dcache/D PMyes SL4Glite3.1 Manche ster 91% Dcache -> DPMyes SL4Glite3.1 Liverpo ol 90% DPMyes SL4Glite3.1 Lancast er Aver age avail abilit y Used Storage (TB) Storage (TB) CPU (kSI2K) SRM brand Space Tokens SRM2. 2OS Middle wareSite

General Status (2)

General Status (3)

General news Manpower changes: –Liverpool: Gridpp post will start last week of September..75 FTE for 3 years has been converted to 1 FTE for 2 years. –Manchester: EGEE Deputy coordinator will start on the 1 of November. Technical Board Meetings: –Increased frequency from 1 per quarter to 1 per month. Northgrid and atlas –It seems its the only UK region to supply people for ATLAS shifts. –Good level of Atlas production NorthGrid VO used by local groups in Manchester

Lancaster news Not much to report (it seems!) All the data have been moved from dcache to DPM and dcache has been decomissioned. –There wasnt much to move There have been few problems with power cuts. New cluster with 126 jobs slots and 100 TB storage is on the way –There have been some delays –Old cluster will remain. Setting up two CEs. Had recently problems with accounting generated by an update of tomcat –Needed to be removed and reinstalled Most of the errors reported for Lancaster in the monitoring pages are due to external sources. –They should be flagged directly in the monitoring system.

Liverpool News dcache grievances: –Ease of dCache maintenance is a big issue; the initial installation was painful and every single update we've done since has broken something. dCache is just way too complicated for what we need from an SE and we don't have the time or manpower to justify it. Moving from dcache to DPM –A test DPM instance has been installed already waiting for the new hardware to arrive to complete the operation. –54 TB should be added in the near future Working to use the University cluster Minimum availability 83% due to glite/dcache upgrade, network configuration problems and university DNS server Had also some problems with SAM tests due to university firewall. –Difficult to remove a service from SAM tests once inserted in the GOCDB. Procedure is contorted.

Manchester News Dcache upgrade grievances –Resilience manager didnt start anymore –Max number of job before it started to time out was only 200 Problems eventually resolved thanks to some serious digging from the developers who got direct access to the system –Turns out that a static parameter hadnt been changed in the configuration files for the resilient manager Resilience is incompatible with space tokens anyway DPM instance with 6TB installed for Atlas production –Eventually new storage will be added to DPM DPM will be dedicated to atlas –Dcache on WN for all the other VOs that dont have as many requirements ATLAS split Manchester in two sites in their configuration –This massively improved the efficiency in production Minimum availability 79% due to dcache upgrade and collateral problems.

Sheffield news Problems with university DNS Bought new hardware for the services (SE, CE and Mon box). –Spent July tuning them and this has affected the availability (still 90%) Already increased storage space to 13 TB –This is online Further 16 TB are on the way. –Hardware is there but the fan are missing CPU increased from 182 to 300 kSI2k Very good productivity for atlas. Availability was never below 90% in the past 6 months.

VOMS and Repos VOMS –skipcacheck option has been enabled on the GridPP VOMS. –Should avoid future problems to users with CA rollover Sergey is also testing new VOMS version New YUM repository has been enabled on the (other face of egee- SA1www.gridpp.ac.uk –EGEE-SA1 is now distributing system monitoring and management tools (following on from the WLCG monitoring WG work with Nagios). There is asingle repository for this (monitoring clients+servers, messaging clients+servers). This will eventually also included user-donated system management tools (e.g. FTSMon, WMSMon) that are approved by the EGEE Operations Automation Team. Manchester people using also the UKI-NORTHGRID-MAN-HEP svn repository.

Conclusions Storage looking good –All the sites have SRM 2.2 and space tokens enabled –dcache relegated to a lesser role (or completely eliminated) should increase stability –All sites are bidding for additional storage or already have bought it –Manchester numerous problems with dcache and atlas way of representing it have been solved. The sites are really active in Atlas and level of productivity is high Just in time

Additional slide regionMarchAprilMayJuneJulyAugusttotalPercent LondonT2374,992188,528247,166687,534709,447842,6563,050, % NorthGrid682,418968,329602,501552,569547,586426,7273,780, % ScotGrid201,081215,51284,543501,927228,452353,5521,585, % SouthGrid654,322583,317414,119404,081330,235477,3662,863, % Tier1A447,145585,081571,793891,354228,291113,5312,837, %