Download presentation
Presentation is loading. Please wait.
Published byMeredith Lawrence Modified over 9 years ago
1
Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś
3
ACC Cyfronet AGH-UST established in 1973 part of AGH University of Science and Technology in Krakow, PL provides free computing resources for scientific institutions centre of competence in HPC and Grid Computing IT service management expertise (ITIL, ISO 20k) member of PIONIER operator of Krakow MAN home for Zeus
4
International projects
5
PL-Grid Consortium Consortium creation – January 2007 a response to requirements from Polish scientists due to ongoing Grid activities in Europe (EGEE, EGI_DS) Aim: significant extension of amount of computing resources provided to the scientific community (start of the PL-Grid Programme) Development based on: projects funded by the European Regional Development Fund as part of the Innovative Economy Program close international collaboration (EGI, ….) previous projects (5FP, 6FP, 7FP, EDA…) National Network Infrastructure available: Pionier National Project computing resources: Top500 list Polish scientific communities: ~75% highly rated Polish publications in 5 Communities PL-Grid Consortium members: 5 High Performance Computing Polish Centres, representing Communities, coordinated by ACC Cyfronet AGH
6
PL-Grid infrastructure Polish national IT infrastructure supporting e-Science based upon resources of most powerful academic resource centres compatible and interoperable with European Grid offering grid and cloud computing paradigms coordinated by Cyfronet Benefits for users one infrastructure instead of 5 separate compute centres unified access to software, compute and storage resources non-trivial quality of service Challenges unified monitoring, accounting, security create environment of cooperation rather than competition Federation – the key to success
7
PLGrid Core project Competence Centre in the Field of Distributed Computing Grid Infrastructures Budget: total 104 949 901,16 PLN, including funding from the EC : 89 207 415,99 PLN Duration: 01.01.2014 – 31.11.2015 Project Coordinator: Academic Computer Centre CYFRONET AGH The main objective of the project is to support the development of ACC Cyfronet AGH as a specialized competence centre in the field of distributed computing infrastructures, with particular emphasis on grid technologies, cloud computing and infrastructures supporting computations on big data.
8
PLGrid Core project – services Basic infrastructure services Uniform access to distributed data PaaS Cloud for scientists Applications maintenance environment of MapReduce type End-user services Technologies and environments implementing the Open Science paradigm Computing environment for interactive processing of scientific data Platform for development and execution of large-scale applications organized in a workflow Automatic selection of scientific literature Environment supporting data farming mass computations
9
200720082009201020112012 Baribal Panda Zeus vSMP Mars Zeus FPGA Zeus GPU Zeus 2013 Platon U3 HPC at Cyfronet
13
Zeus over 1300 servers HP BL2x220c blades HP BL685c fat nodes (64 cores, 256 GB) HP BL490c vSMP nodes (up to 768 cores, 6 TB) HP SL390s GPGPU (2x,8x) nodes Infiniband QDR (Mellanox+Qlogic) >3 PB of disk storage (Lustre+GPFS) Scientific Linux 6, Torque/Moab
15
Zeus - statistics 2400 registered users >2000 jobs running simultaneously >22000 jobs per day 96 000 000 computing hours in 2013 jobs lasting from minutes to weeks jobs from 1 core to 4000 cores
16
Cooling Rack 20°C 40°C Hot aisle Cold aisle
18
Why upgrade? Jobs growing Users hate queuing New users, new requirements Technology moving forward Power bill staying the same
19
New building
20
Requirements Petascale system Lowest TCO Energy efficient Dense Good MTBF Hardware: core count memory size network topology storage
21
Cooling PUE 2.0 CRAC PUE 1.6 CRAC + Precision cooling PUE 1.4 Precision cooling + Chilled water PUE 1.0 ?
22
Direct Liquid Cooling! Up to 1000x more efficient heat exchange than air Less energy needed to move the coolant Hardware can handle CPUs ~70C memory ~80C Hard to cool 100% of HW with liquid network switches PSUs
23
MTBF The less movement the better less pumps less fans less HDDs Example pump MTBF: 50 000 hrs fan MTBF: 50 000 hrs 1800 node system MTBF: 7 hrs
24
The topology 576 computing nodes services nodes Service isle storage nodes Computing isle 576 computing nodes Computing isle 576 computing nodes Computing isle Core IB switches
25
It should count Max jobsize ~10k cores Fastest CPUs, but compatible with old codes Two sockets are enough CPUs, not accelerators Newest memory and more than before Fast interconnect still Infiniband but no need for full CBB fat tree
26
The hard part Public institution, public tender Strict requirements 1.65 PFLOPS, max. 1728 servers 128 GB DDR4 per node warm water cooling, no pumps inside nodes infiniband topology compute+cooling, dry-cooler only Criteria: price, power, space
28
And the winner is… HP Apollo 8000 Most energy efficient The only solution with 100% warm water cooling Least floor space needed Lowest TCO
29
Even more Apollo Focuses also on ‘1’ in PUE! Power distribution Less fans Detailed monitoring ‘energy to solution’ Safer maintenance Less cables Prefabricated piping Simplified management
30
System configuration 1.65 PFLOPS (first 30. of the current Top500) 1728 nodes, Intel Haswell E5-2680v3 41472 cores, 13824 per island 216 TB DDR4 RAM PUE ~1.05, 680 kW total power 15 racks, 12.99 m 2 System ready for undisruptive upgrade Scientific Linux 6 or 7
31
Prometheus Created human Gave fire to the people Accelerated innovation Defeated Zeus
32
Deployment plan Contract signed on 20.10.2014 Installation of the primary loop started on 12.11.2014 First delivery (service island) expected on 24.11.2014 Apollo piping should arrive before Christmas Main delivery in January Installation and acceptance in February Production work since Q2 2015
35
Future plans Benchmarking and Top500 submission Evaluation of Scientific Linux 7 Moving users from the previous system Tuning of applications Energy-aware scheduling First experience presented at HP-CAST 24
36
prometheus@cyfronet.pl
37
More information www.cyfronet.krakow.pl/en www.plgrid.pl/en
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.