Liverpool HEP - Site Report June 2008 Robert Fay, John Bland.

Slides:



Advertisements
Similar presentations
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Advertisements

Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.
QMUL e-Science Research Cluster Introduction (New) Hardware Performance Software Infrastucture What still needs to be done.
Northgrid Status Alessandra Forti Gridpp24 RHUL 15 April 2010.
The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.
Liverpool HEP – Site Report May 2007 John Bland, Robert Fay.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH Home server AFS using openafs 3 DB servers. Web server AFS Mail Server.
Birmingham site report Lawrie Lowe HEP System Managers Meeting, RAL,1 st July 2004.
UCL HEP Computing Status HEPSYSMAN, RAL,
24-Apr-03UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS SECURITY SOFTWARE MAINTENANCE BACKUP.
Winnie Lacesso Bristol Site Report May Scope User Support / Servers / Config Security / Network UKI-SOUTHGRID-BRIS-HEP Upcoming: major infrastructure.
RAL Particle Physics Dept. Site Report. Gareth Smith RAL PPD About 2 staff mainly on windows and general infrastructure About 1.5 staff on departmental.
A couple of slides on RAL PPD Chris Brew CCLRC - RAL - SPBU - PPD.
Site Report: The Linux Farm at the RCF HEPIX-HEPNT October 22-25, 2002 Ofer Rind RHIC Computing Facility Brookhaven National Laboratory.
Liverpool HEP - Site Report June 2010 John Bland, Robert Fay.
National Grid's Contribution to LHCb IFIN-HH Serban Constantinescu, Ciubancan Mihai, Teodor Ivanoaica.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Site Report HEPHY-UIBK Austrian federated Tier 2 meeting
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
Liverpool HEP - Site Report June 2009 John Bland, Robert Fay.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gareth Smith RAL PPD HEP Sysman. April 2003 RAL Particle Physics Department Site Report.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Edinburgh Site Report 1 July 2004 Steve Thorn Particle Physics Experiments Group.
Liverpool HEP - Site Report June 2011 John Bland, Robert Fay.
Tier 3g Infrastructure Doug Benjamin Duke University.
RHUL1 Site Report Royal Holloway Sukhbir Johal Simon George Barry Green.
UCL Site Report Ben Waugh HepSysMan, 22 May 2007.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester
30-Jun-04UCL HEP Computing Status June UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.
Paul Scherrer Institut 5232 Villigen PSI HEPIX_AMST / / BJ95 PAUL SCHERRER INSTITUT THE PAUL SCHERRER INSTITUTE Swiss Light Source (SLS) Particle accelerator.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
1st July 2004HEPSYSMAN RAL - Oxford Site Report1 Oxford University Particle Physics Site Report Pete Gronbech Systems Manager.
Company LOGO “ALEXANDRU IOAN CUZA” UNIVERSITY OF IAŞI” Digital Communications Department Status of RO-16-UAIC Grid site in 2013 System manager: Pînzaru.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
Gareth Smith RAL PPD RAL PPD Site Report. Gareth Smith RAL PPD RAL Particle Physics Department Overview About 90 staff (plus ~25 visitors) Desktops mainly.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
BaBar Cluster Had been unstable mainly because of failing disks Very few (
R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
TCD Site Report Stuart Kenny*, Stephen Childs, Brian Coghlan, Geoff Quigley.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Belle II Physics Analysis Center at TIFR
Glasgow Site Report (Group Computing)
Oxford Site Report HEPSYSMAN
HIGH-PERFORMANCE COMPUTING SYSTEM FOR HIGH ENERGY PHYSICS
QMUL Site Report by Dave Kant HEPSYSMAN Meeting /09/2019
Presentation transcript:

Liverpool HEP - Site Report June 2008 Robert Fay, John Bland

Staff Status One members of staff left in the past year: Paul Trepka, left March 2008 Two full time HEP system administrators John Bland, Robert Fay One full time Grid administrator currently being hired *Closing date for applications was Friday 13 th, 15 applications received One part time hardware technician Dave Muskett

Current Hardware Desktops ~100 Desktops: Scientific Linux 4.3, Windows XP Minimum spec of 2GHz x86, 1GB RAM + TFT Monitor Laptops ~60 Laptops: Mixed architectures, specs and OSes. Batch Farm Software repository (0.7TB), storage (1.3TB) Old batch queue has 10 SL3 dual 800MHz P3s with 1GB RAM medium, short queues consist of 40 SL4 MAP-2 nodes (3GHz P4s) 5 interactive nodes (dual Xeon 2.4GHz) Using Torque/PBS Used for general analysis jobs

Current hardware – continued Matrix 1 dual 2.40GHz Xeon, 1GB RAM 6TB RAID array Used for CDF batch analysis and data storage HEP Servers *4 core servers User file store + bulk storage via NFS (Samba front end for Windows) Web (Apache), (Sendmail) and database (MySQL) User authentication via NIS (+Samba for Windows) Dual Xeon 2.40GHz shell server and ssh server Core servers have a failover spare

Current Hardware - continued LCG Servers CE, SE upgraded to new hardware: CE now 8-core Xeon 2 GHz, 8GB RAM SE now 4-core Xeon 2.33GHz, 8GB RAM, Raid 10 array CE, SE, UI all SL4, GLite 3.1 Mon still SL3, GLite 3.0 BDII SL4, Glite 3.0

Current Hardware – continued MAP2 Cluster 24 rack (960 node) (Dell PowerEdge 650) cluster 4 racks (280 nodes) shared with other departments Each node has 3GHz P4, 1GB RAM, 120GB local storage 19 racks (680 nodes) primarily for LCG jobs (5 racks currently allocated for local ATLAS/T2K/Cockcroft batch processing) 1 rack (40 nodes) for general purpose local batch processing Front end machines for ATLAS, T2K, Cockcroft Each rack has two 24 port gigabit switches All racks connected into VLANs via Force10 managed switch

Storage RAID All file stores are using at least RAID5. Newer servers using RAID6. All RAID arrays using 3ware 7xxx/9xxx controllers on Scientific Linux 4.3. Arrays monitored with 3ware 3DM2 software. File stores New User and critical software store, RAID6+HS, 2.25TB ~10B general purpose hepstores for bulk storage 1.4TB + 0.7TB batchstore+batchsoft for the Batch farm cluster 1.4TB hepdata for backups 37TB RAID6 for LCG storage element

Storage (continued) 3ware Problems! 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card. 3w-9xxx: scsi0: ERROR: (0x06:0x001F): Microcontroller not ready during reset sequence. 3w-9xxx: scsi0: AEN: ERROR: (0x04:0x005F): Cache synchronization failed; some data lost:unit=0. Leads to total loss of data access until system is rebooted. Sometimes leads to data corruption at array level. Seen under iozone load, normal production load, due to drive failure. Anyone else seen this?

Network Topology Force10 Gigabit Switch WAN firewall LCG servers MAP2 OfficesServers 2GB VLAN 1GB link

Network (continued) Core Force10 E600 managed switch. Now have 450 gigabit ports (240 at line rate) Used as central departmental switch, using VLANs Increased bandwidth to WAN using link aggregation to 2-3GBit/s Increased to departmental backbone to 2GBit/s Added departmental firewall/gateway Network intrusion monitoring with snort Most office PCs and laptops are on internal private network Building network infrastructure is creaking -needs rewiring, old cheap hubs and switches need replacing

Security & Monitoring Security Logwatch (looking to develop filters to reduce noise) University firewall + local firewall + network monitoring (snort) Secure server room with swipe card access Monitoring Core network traffic usage monitored with ntop and cacti (all traffic to be monitored after network upgrade) Use sysstat on core servers for recording system statistics Rolling out system monitoring on all servers and worker nodes, using SNMP, Ganglia, Cacti, and Nagios Hardware temperature monitors on water cooled racks, to be supplemented by software monitoring on nodes via SNMP. Still investigating other environment monitoring solutions.

System Management Puppet used for configuration management Dotproject used for general helpdesk RT integrated with Nagios for system management -Nagios automatically creates/updates tickets on acknowledgement -Each RT ticket serves as a record for an individual system

Plans Additional storage for the Grid GridPP3 funded Will be approx. 60? TB May switch from dCache to DPM Upgrades to local batch farm Plans to purchase several multi-core (most likely 8-core) nodes Collaboration with local Computing Services Department Share of their newly commissioned multi-core cluster available