Presentation is loading. Please wait.

Presentation is loading. Please wait.

Winnie Lacesso Bristol Site Report June 2009. 2 Staff & Users Departmental Physics / Networks: JP Melot, Neil Laws (Microsoft); Rhys Morris (Astrophysics.

Similar presentations


Presentation on theme: "Winnie Lacesso Bristol Site Report June 2009. 2 Staff & Users Departmental Physics / Networks: JP Melot, Neil Laws (Microsoft); Rhys Morris (Astrophysics."— Presentation transcript:

1 Winnie Lacesso Bristol Site Report June 2009

2 2 Staff & Users Departmental Physics / Networks: JP Melot, Neil Laws (Microsoft); Rhys Morris (Astrophysics support)‏ Particle Physics: Winnie Lacesso, Rhys Morris (.2)‏ About 40 PP staff & students Desktops: less than 10 - Lx, MS, 2 x iMac Laptops: 40 or so mainly Mac (~16), Xp (~15), Lx (SL4/5, FC)‏ STAFF CHANGES: Yves Coppens = SouthGrid Technical Support, left; Jon Wakelin =.5 Particle Physics support (GPFS, StoRM) left Dr Bob Cregan joined as HPC Storage Admin - will help with StoRM & GPFS

3 3 Servers About 10 non-LCG servers (was 20) consolidated/reitired 10 in 1 yr!! Win2003: fileserving (480GB); considering Unix/Samba replacement Win2K AFS (IBM TransArc 3.6) (230GB): have Unix server ready, no time to get to it & Win2K server keeps working... Most servers =SL4/5: NFS (1 (was 5)), PBS batch(3), compute (~3), subversion/elog, mediawiki, infrastructure (web, DHCP, kickstart)‏

4 4 UBristol HPC: PP usage Was 30 jobslots, now up to 90 on SL4 HPC cluster (2GB RAM/core)‏ Not yet using SL5 HPC cluster (only 1GB RAM/core)‏ Jon W was instrumental in getting CE & SE up+running!

5 5 RAID Grief SCSI Agrro DPM has 2 x RAID arrays attached. 16-bay slid into borken/faulty after commissioning & 2 years work. Months of grief + debugging. Aug 5 10:30:17 lcgse01 kernel: SCSI error : return code = 0x10000 Aug 5 10:30:17 lcgse01 kernel: end_request: I/O error, dev sdf, sector 787223 Aug 5 10:30:17 lcgse01 kernel: Buffer I/O error on device sdf1, logical block 98395 Aug 5 10:30:17 lcgse01 kernel: lost page write due to I/O error on sdf1 Aug 5 10:30:37 lcgse01 kernel: scsi1:0:2:0: Attempting to abort cmd ebdd0e00: 0x28 0x0 0x89 0xbf Aug 5 10:30:37 lcgse01 kernel: scsi1: At time of recovery, card was not paused Aug 5 10:30:37 lcgse01 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins<<<<<<<<<<<<<<<<< Aug 5 10:30:37 lcgse01 kernel: scsi1: Dumping Card State at program address 0x26 Mode 0x33 Aug 5 10:30:37 lcgse01 kernel: Card was paused Replace SCSI controller (Adaptec, for LSI) - no diff Vendor agreed & sent replacement Dec 2008; installed Jan 2009

6 6 Shoulder that Load!

7 7 StoRM SE, GPFS New hardware for HPC CE & StoRM SE, also gridftp server & new MON (syslog, Nagios, etc): X7DBU Xeon E5405 with 2GB RAM/core HPC CE working well except gpfs timeouts – patchy OPS SAM fails Problems with StoRM - gpfs multiclustering not yet working, rfio permission problems (ACLs??) - thought Jon left it in working order but guess not... New Storage Admin (Bob Cregan) will help get gpfs multiclustering working Good performance on new hardware!

8 8 Security User laptops frequently go offsite (home, CERN, RAL), come back & reconnect to internal network. No (detected) incidents. Even from users with root/admin access on laptops. One laptop lost - student forgot bag at bus stop. Not there on return. Fortunately, USB backup disk kept in different location. Moral of story: carry USB backup disk separate from laptop. Ongoing scary ssh-linux incident: no intrusions detected here so far

9 9 Issues Upcoming/pending work : Ongoing: New servers replacing old – servers waiting VMs will replace existing web/svn/elog/wiki server, existing SL3 MON, & probably others Recent/ongoing problems :‏ UPS needs rearranging – some important servers not on UPS Workload really increased since Yves & Jon left A/C failure May 2009 – A/C being replaced (before too hot we hope)‏


Download ppt "Winnie Lacesso Bristol Site Report June 2009. 2 Staff & Users Departmental Physics / Networks: JP Melot, Neil Laws (Microsoft); Rhys Morris (Astrophysics."

Similar presentations


Ads by Google