TCD Site Report Stuart Kenny*, Stephen Childs, Brian Coghlan, Geoff Quigley
TCD Two roles: –UKI grid site –Grid-Ireland operations centre 18 sites centrally managed by operations team (8 members, soon to be 7) Responsible for TCD site and Grid- Ireland central services Quattor deployed and managed –Extensive use of Xen VMs
Hardware Dell 2950 gateway host [16GB DRAM + 6TB RAID6] –Xen host (CE, UI, R-GMA MON, test WNs) Dell 2950 SE host [16GB DRAM + 6TB RAID6] 96 x Dell 1950 WNs [16GB DRAM + 500GB] –50 x U/G lab Condor pool WNs 8 x Dell 2950 central server hosts [16GB DRAM + 16TB RAID6] –host01: webserver + rt –host02: repository –host03: VOMS, myproxy, gLite WMS –Host04: BDII, R-GMA, WMS –host05: monitoring server, oracle server –host06: portal servers –host07: datamgt servers –host08: alternate middleware 8 x Dell 2950 redundant central server hosts [16GB DRAM + 16TB RAID6] 1 Ge networking, with 3 x 10Ge uplinks
Storage TCD already had some –Dell Poweredge 2950 (2xQuad Xeon) –Dell MD1000 (SAS - JBOD) After procurement data store has total –8x Dell PE2950 –30x MD1000, each with 15x 1TB disks ~11.6 TiB after RAID6 and XFS format (~348 TiB) –Dell Blade Chassis with 8x M600 blades –Dell tape library (24x Ultrium 4 tapes) –HP ExDS9100 with 4 capacity blocks of 82x 1TB disks and 4 blades ~ 233 TiB total available for NFS/http export Storage Workshop - Geoff Quigley Thurs 13:50
Infrastructure Room needed upgrade –Another cooler –UPS maxed out New high-current AC circuits added 2x 3kVA UPS per rack acquired for Dell equipment ExDS has 4x 16A 3Ø - 2 on room UPS, 2 raw 10 GbE to move data! Storage Workshop - Geoff Quigley Thurs 13:50
Redundant Operations Centre Aim is to keep up-to- date replicas of core server VMs to allow failover in case of network or hardware failures Design decisions –Replicate storage “underneath” Xen VMs –Replicate at block level: avoid need for service-specific replication policies –Manual failover initially
Monitoring A lot of work recently on monitoring configuration –Want to configure as much as possible from common Quattor templates Nagios –Submitting local WLCG grid probes for G-I VOs Lemon Ganglia Also used –Weathermap –Cacti –ASI (Security Day talk) –…
Grid-Ireland Setup Monitoring server EGEE SAM GI SAM Quattor templates Site admins Get site status Issue alarms TCD Site Nagios NSCA Lemon Agent Lemon Host Check Nagios NRPE gridui GI Sites
Lemon-Nagios Integration Lemon service added Additional lemon metrics added to hosts Cron executes lemon-host-check –Output sent to nagios via nsca Exception results in Lemon service failure
Lemon-Nagios Integration
Monitoring - Weathermap
Active Security Existing Grid security activities focused on prevention –Authentication, authorization Active security focused on –Detection –Reaction 3 components –Security monitoring –Alert Analysis –Control Engine Security Day – Stuart Kenny Wed 10:15
Active Security - Report