Download presentation
Presentation is loading. Please wait.
Published byMonica Simpson Modified over 8 years ago
1
RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper
2
My talk will be... Where we’re at now Our new stuff, including – GridPP purchases – DRI networking kit Benchmarking and hyperthreads Virtual machine infrastructure Managing configuration and stuff: cfEngine vs Puppet Future stuff
3
RALPP For Dummies Part of SouthGrid Staff – Chris Brew (part) – Rob Harper (part) One cluster serving Tier 2 (85%) and Tier 3 (15%), managed by Torque/Maui dCache storage
4
RALPP CPU
5
Cluster is currently nominally: 2,872 Job slots 26,409 HS06 Where available, hyperthreads used to get 150% of physical cores
6
RALPP Storage TB
7
RALPP Storage 1,060 TB in production Soon to be 1,260 TB
8
New Stuff: GridPP Purchases CPU: – 9 * Viglen/Supermicro Twin 2 Intel E5645 based 48 GB / node Using hyperthreads => 648 job slots, 6208 HS06 Disk: – 5 * Viglen/Supermicro 24 bay storage nodes => 200 TB of disk pool
9
New Stuff: Networking DRI money bought us: – 5 * Force10 s4810 switches – A heap of 10Gb NICs for older disk pool nodes – A heap of 10Gb cables Coming soon: a much reconfigured network...
10
New Network Layout
11
Benchmarking & Hyperthreads We ran HS06 benchmark on a heap of nodes with varying numbers of concurrent benchmark jobs Going past # of physical cores did give us some gains
12
Benchmarking & Hyperthreads So we committed 1.5 * physical cores as job slots for some nodes and ran real jobs No significant drop in efficiency More work done Many details on SouthGrid blog at http://bit.ly/Iu7BfS
13
Virtual Machines Current set-up: – Xen VMs spread between a couple of servers – Local storage, nothing clever Currently in test: – Cluster running HyperV Yes, we’ll be running Linux VMs on Windows – EqualLogic storage iSCSI Mirroring, etc.
14
Configuration Management Already much discussed yesterday, but here’s our perspective... We currently rely on cfEngine v2 This is not supported natively on SL6 (or at all) Main options seem to be: – Crowbar in legacy cfEngine – cfEngine v3 – will need configs rewritten – Switch to Puppet – will need configs rewritten
15
Puppet Puppet seems to be a strong choice Particularly as other Tier 2s are coming to the same decision Not got far yet We have a working Puppet Master with some basic manifests set up We have an SL6 client for test purposes Planning to use Puppet for SL6 hosts as we set them up – leaving SL5 kit on cfEngine
16
Puppet Our cfEngine config relies massively on EditFiles functionality Puppet does not have this – Can run scripts to do edits – Can use modules (eg. iptables) that do the work for you We need to learn to think in a different way to take advantage of Puppet
17
Things to come... Getting network configuration updated Start deploying VMs in HyperV Getting Puppet configuration management running properly Start using SL6 as a standard install for services where we have no reason not to Improved monitoring
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.