Patryk Lasoń, Marek Magryś

Slides:

Advertisements

Similar presentations

Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.

Advertisements

©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.

Beowulf Supercomputer System Lee, Jung won CS843.

CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.

Program Systems Institute Russian Academy of Sciences1 Program Systems Institute Research Activities Overview Extended Version Alexander Moskovsky, Program.

IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.

Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….

Scientific Computing Laboratory I NSTITUTE OF P HYSICS B ELGRADE WWW. SCL. RS.

5 Nov 2001CGW'01 CrossGrid Testbed Node at ACC CYFRONET AGH Andrzej Ozieblo, Krzysztof Gawel, Marek Pogoda 5 Nov 2001.

1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.

Bill Wrobleski Director, Technology Infrastructure ITS Infrastructure Services.

1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.

Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.

2nd Workshop on Energy for Sustainable Science at Research Infrastructures Report on parallel session A3 Wayne Salter on behalf of Dr. Mike Ashworth (STFC)

Federico Calzolari 1, Silvia Arezzini 2, Alberto Ciampa 2, Enrico Mazzoni 2 1 Scuola Normale Superiore - Pisa, Italy 2 National Institute of Nuclear Physics.

CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.

Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |

EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Ab initio grid chemical software ports – transferring.

EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.

Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.

© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 HP + DDN = A WINNING PARTNERSHIP Systems architected by HP and DDN Full storage hardware and.

HPC at IISER Pune Neet Deo System Administrator

. Bartosz Lewandowski. Center of e-Infrastructure National Research and Education Network PIONIER National Research and Education Network PIONIER Research.

Cracow Grid Workshop, October 15-17, 2007 Polish Grid Polish Grid: National Grid Initiative in Poland Jacek Kitowski Institute of Computer Science AGH-UST.

UPPMAX and UPPNEX: Enabling high performance bioinformatics Ola Spjuth, UPPMAX

Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.

HPC Business update HP Confidential – CDA Required

S&T IT Research Support 11 March, 2011 ITCC. Fast Facts Team of 4 positions 3 positions filled Focus on technical support of researchers Not “IT” for.

Large Scale Parallel File System and Cluster Management ICT, CAS.

Polish Infrastructure for Supporting Computational Science in the European Research Space FiVO/QStorMan: toolkit for supporting data-oriented applications.

Lightweight construction of rich scientific applications Daniel Harężlak(1), Marek Kasztelnik(1), Maciej Pawlik(1), Bartosz Wilk(1) and Marian Bubak(1,

Towards energy efficient HPC HP Apollo 8000 at Cyfronet Part I Patryk Lasoń, Marek Magryś.

Federating PL-Grid Computational Resources with the Atmosphere Cloud Platform Piotr Nowakowski, Marek Kasztelnik, Tomasz Bartyński, Tomasz Gubała, Daniel.

A 1.7 Petaflops Warm-Water- Cooled System: Operational Experiences and Scientific Results Łukasz Flis, Karol Krawentek, Marek Magryś.

Sugon Server TC5600-H v3 Moscow, 12/2015.

Amagees Tech Corp value added services Data Management and Infrastructure.

Power and Cooling at Texas Advanced Computing Center Tommy Minyard, Ph.D. Director of Advanced Computing Systems 42 nd HPC User Forum September 8, 2011.

Lenovo - Eficiencia Energética en Sistemas de Supercomputación Miguel Terol Palencia Arquitecto HPC LENOVO.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.

Ismayilov Ali Institute of Physics of ANAS Creating a distributed computing grid of Azerbaijan for collaborative research NEC'2011.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

SESAME-NET: Supercomputing Expertise for Small And Medium Enterprises Todor Gurov Associate Professor IICT-BAS

GPU Solutions Universal I/O Double-Sided Datacenter Optimized Twin Architecture SuperBlade ® Storage SuperBlade ® Configuration Training Francis Lam Blade.

The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.

Gang Chen, Institute of High Energy Physics Feb. 27, 2012, CHAIN workshop,Taipei Co-ordination & Harmonisation of Advanced e-Infrastructures Research Infrastructures.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

PL-Grid: Polish Infrastructure for Supporting Computational Science in the European Research Space 1 ESIF - The PLGrid Experience ACK Cyfronet AGH PL-Grid.

Piotr Bała, Marcin Radecki, Krzysztof Benedyczak

High Performance Computing (HPC)

NIIF HPC services for research and education

Buying into “Summit” under the “Condo” model

What is HPC? High Performance Computing (HPC)

Operations and plans - Polish sites

Heterogeneous Computation Team HybriLIT

DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE

Flex System Enterprise Chassis

DI4R Conference, September, 28-30, 2016, Krakow

Jeremy Maris Research Computing IT Services University of Sussex

DIRECT IMMERSION COOLED IMMERS HPC CLUSTERS BUILDING EXPERIENCE

R.Mashinistov (UTA) July

Stallo: First impressions

Clustered Systems Introduction

Shared Research Computing Policy Advisory Committee (SRCPAC)

IBM Power Systems.

CUBAN ICT NETWORK UNIVERSITY COOPERATION (VLIRED

Ernst Haunschmid, TU WIEN EOSC, 30th October 2018

H2020 EU PROJECT | Topic SC1-DTH | GA:

Presentation transcript:

Patryk Lasoń, Marek Magryś Further expansion of HPE’s largest warm-water-cooled Apollo 8000 system: „Prometheus” at Cyfronet Patryk Lasoń, Marek Magryś

ACC Cyfronet AGH-UST established in 1973 part of AGH University of Science and Technology in Krakow, Poland provides free computing resources for scientific institutions centre of competence in HPC and Grid Computing IT service management expertise (ITIL, ISO 20k) member of PIONIER consortium operator of Krakow MAN home for supercomputers

PL-Grid infrastructure Polish national IT infrastructure supporting e-Science based upon resources of most powerful academic resource centres compatible and interoperable with European Grid offering grid and cloud computing paradigms coordinated by Cyfronet Benefits for users unified infrastructure from 5 separate compute centres unified access to software, compute and storage resources non-trivial quality of service Challenges unified monitoring, accounting, security create environment of cooperation rather than competition Federation – the key to success

PLGrid Core project Competence Centre in the Field of Distributed Computing Grid Infrastructures Duration: 01.01.2014 – 31.11.2015 Project Coordinator: Academic Computer Centre CYFRONET AGH The main objective of the project is to support the development of ACC Cyfronet AGH as a specialized competence centre in the field of distributed computing infrastructures, with particular emphasis on grid technologies, cloud computing and infrastructures supporting computations on big data.

ZEUS 374 TFLOPS #269 on Top500

Zeus usage

New building 5 MW, UPS + diesel

2.4 PFLOPS, #49 on Top500

Prometheus – Phase 1 Installed in Q2 2015 HP Apollo 8000 13 m2, 15 racks (3 CDU, 12 compute) 1.65 PFLOPS 1728 nodes, Intel Haswell E5-2680v3 41472 cores, 13824 per island 216 TB DDR4 RAM N+1/N+N redundancy

Prometheus – Phase 2 Installed in Q4 2015 4th island 432 regular nodes (2 CPUs, 128 GB RAM) 72 nodes with GPGPUs (2x NVIDIA Tesla K40 XL) 2.4 PFLOPS total performance (Rpeak) 2140 TFLOPS in CPUs 256 TFLOPS in GPUs 2232 nodes, 53568 CPU cores, 279 TB RAM <850 kW power (including cooling)

Prometheus storage Diskless compute nodes Separate procurment for storage Lustre on top of DDN hardware Two filesystems: Scratch: 120 GB/s, 5 PB usable space Archive: 60 GB/s, 5 PB usable space HSM-ready NFS for home directories and software

Prometheus: IB farbic Core IB switches Compute isle Compute isle 576 CPU nodes Compute isle 576 CPU nodes Compute isle 576 CPU nodes Compute isle 432 CPU nodes 72 GPU nodes Compute isle services nodes Service isle storage nodes

Why liquid cooling? Water: up to 1000x more efficient heat exchange than air Less energy needed to move the coolant Hardware (CPUs, DIMMs) can handle ~80 C Challenge: cool 100% of HW with liquid network switches PSUs

What about MTBF? The less movement the better Example pumps fans HDDs pump MTBF: 50 000 h fan MTBF: 50 000 h 2300 node system MTBF: ~5 h

Why Apollo 8000? Most energy efficient The only solution with 100% warm water cooling Highest density Lowest TCO

Even more Apollo Focuses also on ‘1’ in PUE! Dry node maintenance Power distribution Less fans Detailed monitoring ‘energy to solution’ Dry node maintenance Less cables Prefabricated piping Simplified management

Secondary loop

System software CentOS 7 Boot to RAM over IB, image distribution with HTTP Whole machine boots up in 10 minutes with just 1 boot server Hostname/IP generator based on MAC collector Data automatically collected from APM and iLO Graphical monitoring of power, temperature and network traffic SNMP data source, GUI allows easy problem location Now synced with SLURM Spectacular iLO LED blinking system developed for the official launch

HPL: power usage

HPL: water temperature

Real application performance Prometheus vs. Zeus in theory 4x difference core to core Storage system (scratch) 10x faster More time to focus on the most popular codes COSMOS++ – 4.4x Quantum Espresso – 5.6x ADF – 6x Widely used QC code with the name derrived from a famous mathematician – 2x

Future plans Continue to move users from the previous system Add a few large-memory nodes Further improvements of the monitoring tools Detailed energy and temperature monitoring Energy-aware scheduling Collect the annual energy and PUE