Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś.

Slides:

Advertisements

Similar presentations

©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.

Advertisements

HPC in Poland Marek Niezgódka ICM, University of Warsaw

Appro Xtreme-X Supercomputers A P P R O I N T E R N A T I O N A L I N C.

CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.

Program Systems Institute Russian Academy of Sciences1 Program Systems Institute Research Activities Overview Extended Version Alexander Moskovsky, Program.

Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Services and Operations in Polish NGI M. Radecki,

LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.

Data Centre World Expo 2009 Designing, building and operating high density data centres Kevin Sell Head of Technical Facilities, Telstra International.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) SX-9, a good Choice ? Steinbuch Centre for Computing.

The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.

Aim High…Fly, Fight, Win NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011.

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

HEPiX 21/05/2014 Olof Bärring, Marco Guerri – CERN IT

EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Ab initio grid chemical software ports – transferring.

© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 HP + DDN = A WINNING PARTNERSHIP Systems architected by HP and DDN Full storage hardware and.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.

. Bartosz Lewandowski. Center of e-Infrastructure National Research and Education Network PIONIER National Research and Education Network PIONIER Research.

EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.

Cracow Grid Workshop, October 15-17, 2007 Polish Grid Polish Grid: National Grid Initiative in Poland Jacek Kitowski Institute of Computer Science AGH-UST.

University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.

Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.

Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.

INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.

Progress Energy Corporate Data Center Rob Robertson February 17, 2010 of North Carolina.

CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.

Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.

Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)

JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.

DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.

Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.

BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.

Lightweight construction of rich scientific applications Daniel Harężlak(1), Marek Kasztelnik(1), Maciej Pawlik(1), Bartosz Wilk(1) and Marian Bubak(1,

Federating PL-Grid Computational Resources with the Atmosphere Cloud Platform Piotr Nowakowski, Marek Kasztelnik, Tomasz Bartyński, Tomasz Gubała, Daniel.

A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Patryk Lasoń, Marek Magryś

Accounting for Load Variation in Energy-Efficient Data Centers

Lenovo - Eficiencia Energética en Sistemas de Supercomputación Miguel Terol Palencia Arquitecto HPC LENOVO.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

SEE-GRID-2 The SEE-GRID-2 initiative is co-funded by the European Commission under the FP6 Research Infrastructures contract no

Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.

Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.

SESAME-NET: Supercomputing Expertise for Small And Medium Enterprises Todor Gurov Associate Professor IICT-BAS

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.

Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.

Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.

CMB & LSS Virtual Research Community Marcos López-Caniego Enrique Martínez Isabel Campos Jesús Marco Instituto de Física de Cantabria (CSIC-UC) EGI Community.

WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.

EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.

EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu

EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.

PL-Grid: Polish Infrastructure for Supporting Computational Science in the European Research Space 1 ESIF - The PLGrid Experience ACK Cyfronet AGH PL-Grid.

Piotr Bała, Marcin Radecki, Krzysztof Benedyczak

Accessing the VI-SEEM infrastructure

A Brief Introduction to NERSC Resources and Allocations

What is HPC? High Performance Computing (HPC)

on behalf of the CECM Project Marian Bubak

LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,

DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE

DI4R Conference, September, 28-30, 2016, Krakow

Appro Xtreme-X Supercomputers

DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE

EGI Webinar - Introduction -

IBM Power Systems.

Ernst Haunschmid, TU WIEN EOSC, 30th October 2018

H2020 EU PROJECT | Topic SC1-DTH | GA:

Presentation transcript:

Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś

ACC Cyfronet AGH-UST established in 1973 part of AGH University of Science and Technology in Krakow, PL provides free computing resources for scientific institutions centre of competence in HPC and Grid Computing IT service management expertise (ITIL, ISO 20k) member of PIONIER operator of Krakow MAN home for Zeus

International projects

PL-Grid Consortium Consortium creation – January 2007 a response to requirements from Polish scientists due to ongoing Grid activities in Europe (EGEE, EGI_DS) Aim: significant extension of amount of computing resources provided to the scientific community (start of the PL-Grid Programme) Development based on: projects funded by the European Regional Development Fund as part of the Innovative Economy Program close international collaboration (EGI, ….) previous projects (5FP, 6FP, 7FP, EDA…) National Network Infrastructure available: Pionier National Project computing resources: Top500 list Polish scientific communities: ~75% highly rated Polish publications in 5 Communities PL-Grid Consortium members: 5 High Performance Computing Polish Centres, representing Communities, coordinated by ACC Cyfronet AGH

PL-Grid infrastructure Polish national IT infrastructure supporting e-Science based upon resources of most powerful academic resource centres compatible and interoperable with European Grid offering grid and cloud computing paradigms coordinated by Cyfronet Benefits for users one infrastructure instead of 5 separate compute centres unified access to software, compute and storage resources non-trivial quality of service Challenges unified monitoring, accounting, security create environment of cooperation rather than competition Federation – the key to success

PLGrid Core project Competence Centre in the Field of Distributed Computing Grid Infrastructures Budget: total ,16 PLN, including funding from the EC : ,99 PLN Duration: – Project Coordinator: Academic Computer Centre CYFRONET AGH The main objective of the project is to support the development of ACC Cyfronet AGH as a specialized competence centre in the field of distributed computing infrastructures, with particular emphasis on grid technologies, cloud computing and infrastructures supporting computations on big data.

PLGrid Core project – services Basic infrastructure services Uniform access to distributed data PaaS Cloud for scientists Applications maintenance environment of MapReduce type End-user services Technologies and environments implementing the Open Science paradigm Computing environment for interactive processing of scientific data Platform for development and execution of large-scale applications organized in a workflow Automatic selection of scientific literature Environment supporting data farming mass computations

Baribal Panda Zeus vSMP Mars Zeus FPGA Zeus GPU Zeus 2013 Platon U3 HPC at Cyfronet

Zeus over 1300 servers HP BL2x220c blades HP BL685c fat nodes (64 cores, 256 GB) HP BL490c vSMP nodes (up to 768 cores, 6 TB) HP SL390s GPGPU (2x,8x) nodes Infiniband QDR (Mellanox+Qlogic) >3 PB of disk storage (Lustre+GPFS) Scientific Linux 6, Torque/Moab

Zeus - statistics 2400 registered users >2000 jobs running simultaneously >22000 jobs per day computing hours in 2013 jobs lasting from minutes to weeks jobs from 1 core to 4000 cores

Cooling Rack 20°C 40°C Hot aisle Cold aisle

Why upgrade? Jobs growing Users hate queuing New users, new requirements Technology moving forward Power bill staying the same

New building

Requirements Petascale system Lowest TCO Energy efficient Dense Good MTBF Hardware: core count memory size network topology storage

Cooling PUE 2.0 CRAC PUE 1.6 CRAC + Precision cooling PUE 1.4 Precision cooling + Chilled water PUE 1.0 ?

Direct Liquid Cooling! Up to 1000x more efficient heat exchange than air Less energy needed to move the coolant Hardware can handle CPUs ~70C memory ~80C Hard to cool 100% of HW with liquid network switches PSUs

MTBF The less movement the better less pumps less fans less HDDs Example pump MTBF: hrs fan MTBF: hrs 1800 node system MTBF: 7 hrs

The topology 576 computing nodes services nodes Service isle storage nodes Computing isle 576 computing nodes Computing isle 576 computing nodes Computing isle Core IB switches

It should count Max jobsize ~10k cores Fastest CPUs, but compatible with old codes Two sockets are enough CPUs, not accelerators Newest memory and more than before Fast interconnect still Infiniband but no need for full CBB fat tree

The hard part Public institution, public tender Strict requirements 1.65 PFLOPS, max servers 128 GB DDR4 per node warm water cooling, no pumps inside nodes infiniband topology compute+cooling, dry-cooler only Criteria: price, power, space

And the winner is… HP Apollo 8000 Most energy efficient The only solution with 100% warm water cooling Least floor space needed Lowest TCO

Even more Apollo Focuses also on ‘1’ in PUE! Power distribution Less fans Detailed monitoring ‘energy to solution’ Safer maintenance Less cables Prefabricated piping Simplified management

System configuration 1.65 PFLOPS (first 30. of the current Top500) 1728 nodes, Intel Haswell E5-2680v cores, per island 216 TB DDR4 RAM PUE ~1.05, 680 kW total power 15 racks, m 2 System ready for undisruptive upgrade Scientific Linux 6 or 7

Prometheus Created human Gave fire to the people Accelerated innovation Defeated Zeus

Deployment plan Contract signed on Installation of the primary loop started on First delivery (service island) expected on Apollo piping should arrive before Christmas Main delivery in January Installation and acceptance in February Production work since Q2 2015

Future plans Benchmarking and Top500 submission Evaluation of Scientific Linux 7 Moving users from the previous system Tuning of applications Energy-aware scheduling First experience presented at HP-CAST 24

More information