Download presentation
Presentation is loading. Please wait.
Published byEgbert Collins Modified over 8 years ago
1
Farming Andrea Chierici CNAF Review 2015
2
Current situation
3
Computing resources 04/05/2015CNAF Review 2015, Andrea Chierici3 160K HS-06 Very stable, just a few hardware failures on older nodes We had to reboot the whole farm twice since december (2 critical kernel upgrades + glibc) – Smooth procedure, no service interruption
4
CPU tender 2014 tender still not installed (30K HS-06) – Procedure completed in October 2014, contract signed in march 2015 (?!) – Will be installed shortly – 2014 pledged resources still guaranteed 2015 tender targeted on blade solutions (Lenovo best bidder) – Machines should be available during summer We will be able to dismiss very old computing nodes and hopefully improve our PUE 04/05/2015CNAF Review 2015, Andrea Chierici4
5
Software Renewed LSF contract for whole INFN with Platform/IBM for next 4 years Virtualization infrastructure migrated to the “ovirt” software platform. – We can delegate machine management to single users or groups (e.g. user support) – Interface the infrastructure with other software tools (e.g. foreman) 04/05/2015CNAF Review 2015, Andrea Chierici5
6
Driving the migration of the provisioning software used by the whole CNAF puppet and foreman adopted in place of quattor (phasing-out this year) – Long process, required re-writing of many templates – New hardware infrastructure to be installed – Already able to install, configure and maintain WNs and UIs (2014 and 2015 tenders will be installed with puppet) Provisioning 04/05/2015CNAF Review 2015, Andrea Chierici6
7
Multicore support INFN-T1 fully supports Multi CORE and HIMEM jobs Dynamic partitioning activated on August the 1° Enabled on a subset of farm racks (up to 45K HS06) Production quality, tunable Currently Used by Atlas and CMS Accounting data properly delivered to Apel Solution implemented “in-house”, exportable – CERN interested, evaluating it 04/05/2015CNAF Review 2015, Andrea Chierici7
8
New activities and strategies
9
Testing low power solutions We started to investigate some promising low power solutions developed by hardware manufacturers. All solutions tested adopt the Intel Atom processor, based on x86 architecture – Transparent to users – If experiments code uses recent instructions performances may decrease significantly Result is encouraging – TCO lower compared to standard solutions (after 3 years), no tender prices 04/05/2015CNAF Review 2015, Andrea Chierici9
10
Our testing units HP Moonshot with m350 cards – 4 motherboards on each blade, no storage Supermicro Microserver – Form factor does not fit our requirements Contact with hardware vendors in order to test latest solutions in-house – HP probed our WNs in order to determine the best storage solution Providing us a new moonshot with 10 m350 cards and external iSCSI storage – Supermicro microblade Each blade carries 4 motherboards and 4 discs, less compact but with built-in storage 04/05/2015CNAF Review 2015, Andrea Chierici10
11
Avoton performance 04/05/2015CNAF Review 2015, Andrea Chierici11
12
Changing batch system LSF very robust and reliable (but expensive!) Lately HTCondor gained LHC community attention – Proved to scale better than Slurm – Interesting features (i.e. adding nodes without reconfiguring) Within 3 years we plan to change to HTCondor – Several internal software, mainly for accounting, to deal with – Implies a partitioning of the farm during the transition 04/05/2015CNAF Review 2015, Andrea Chierici12
13
Clouds Dynamic partitioning of the farm – a subset of the WNs can be dynamically detached from LSF and assigned to a Cloud Controller pilot service to provide interactive access requested by some experiments ready soon – Extending the computing resources using external data centers Interest in Indigo, focus on WP4 04/05/2015CNAF Review 2015, Andrea Chierici13
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.