Www.montblanc-project.eu This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme.

Slides:



Advertisements
Similar presentations
Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Advertisements

Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN
1 Computational models of the physical world Cortical bone Trabecular bone.
©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
Parallel computer architecture classification
Ultra-Efficient Exascale Scientific Computing Lenny Oliker, John Shalf, Michael Wehner And other LBNL staff.
High-Performance Computing
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
Last Lecture The Future of Parallel Programming and Getting to Exascale 1.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Beowulf Supercomputer System Lee, Jung won CS843.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
BY MANISHA JOSHI.  Extremely fast data processing-oriented computers.  Speed is measured in “FLOPS”.  For highly calculation-intensive tasks.  For.
FSOSS Dr. Chris Szalwinski Professor School of Information and Communication Technology Seneca College, Toronto, Canada GPU Research Capabilities.
Supermicro © 2009Confidential HPC Case Study & References.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos.
Some Thoughts on Technology and Strategies for Petaflops.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
Lecture 1: Introduction to High Performance Computing.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.
1 The Problem of Power Consumption in Servers L. Minas and B. Ellison Intel-Lab In Dr. Dobb’s Journal, May 2009 Prepared and presented by Yan Cai Fall.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
2nd Workshop on Energy for Sustainable Science at Research Infrastructures Report on parallel session A3 Wayne Salter on behalf of Dr. Mike Ashworth (STFC)
ARM Cortex-A9 performance in HPC applications Kurt Keville, Clark Della Silva, Merritt Boyd ARM gaining market share in embedded systems and SoCs Current.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
ICTSUS6233A Integrate sustainability in ICT planning and design projects analyse energy audit data on enterprise resource consumption plan and integrate.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.
Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April
Design of a Software Correlator for the Phase I SKA Jongsoo Kim Cavendish Lab., Univ. of Cambridge & Korea Astronomy and Space Science Institute Collaborators:
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Ocelot and the SST-MacSim Simulator Genie.
October 12, 2004Thomas Sterling - Caltech & JPL 1 Roadmap and Change How Much and How Fast Thomas Sterling California Institute of Technology and NASA.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
A lower bound to energy consumption of an exascale computer Luděk Kučera Charles University Prague, Czech Republic.
System On Chip Devices for High Performance Computing Design Automation Conference 2015 System On Chip Workshop Noel Wheeler
© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Lawrence Livermore National Laboratory BRdeS-1 Science & Technology Principal Directorate - Computation Directorate How to Stop Worrying and Learn to Love.
IDC HPC User Forum April 14 th, 2008 A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
HP-SEE High-Performance Computing Infrastructure for South East Europe’s Research Communities 8th e-Infrastructure Concertation Meeting,
High Performance Computing
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
1 High Performance Computing: A Look Behind and Ahead Jack Dongarra Computer Science Department University of Tennessee.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Cray XD1 Reconfigurable Computing for Application Acceleration.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Hardware Architecture
Profiling and Characterising Software Application Energy Consumption Ian Osborne ICT KTN and EEC SIG.
Connect A 3 Contact persons: Sandro D'Elia Anne-Marie Sassen Horizon 2020: LEIT – ICT WP
C. Murad Özsert Intel's Tera Scale Processor Architecture.
APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
overview of activities on High Performance Computing
Low-Cost High-Performance Computing Via Consumer GPUs
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Parallel computer architecture classification
Appro Xtreme-X Supercomputers
Super Computing By RIsaj t r S3 ece, roll 50.
ICT NCP Infoday Brussels, 23 June 2010
Low Latency Analytics HPC Clusters
Course Description: Parallel Computer Architecture
Presentation transcript:

This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under grant agreement n° Exploiting mobile phone technology to build energy efficient supercomputers: the Mont Blanc project Simon McIntosh-Smith, University of Bristol, UK Disclaimer: Speaking for myself... All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia, Bull, or others, implied.

Motivation: from MegaFLOPS to ExaFLOPS High Performance Computing (HPC) systems are now a common part of large-scale research infrastructure These systems have been growing larger and larger, and now themselves can consume considerable energy Today we have systems capable of performing O(10 15 ) floating point operations per second, and we're working towards O(10 18 ) Power consumption has grown by two orders of magnitude in the last two decades, from hundreds of kW in the '90s to tens of MWs today This is starting to cause PROBLEMS!

Power Consumption growing in the Top x in 5 y 3.13 x in 5 y 5.04 x in 5 y Graph courtesy of Erich Strohmaier, Lawrence Berkeley National Laboratory

Providing power is not the problem We can provide tens of MWs if they're really needed…

kWh cost for Medium size industries (€, 2011) The HPC power wall isn’t due to power availability We could provide 500 MW to an HPC facility It’s the electricity bill at the end of the month! Also CO2… 1 MWatt ~ 1 M EUR / year

The problem … and the opportunity Europe represents ~40% of the HPC market Yet, it does not have HPC technology of its own Nobody yet knows how to build a sustainable EFLOPS supercomputer Power defines performance Consensus about it being revolutionary, not evolutionary Europe is very strong in embedded computing The most energy-efficient computing technology today

Mont-Blanc project goals To develop a European Exascale approach Leverage commodity and embedded power-efficient technology Supported by two EU projects so far, 25.5M€ total: FP7 Objective ICT Exascale computing, software and simulation: 3-year IP Project (October September 2014) Total budget: 14.5 M€ (8.1 M€ EC contribution), FP7 Objective ICT Exascale computing platforms, software and applications: 3-year IP Project (October September 2016) Total budget: 11.5 M€ (8.0 M€ EC contribution),

Mont-Blanc 2 consortium

First, vector processors dominated HPC 1st Top500 list (June 1993) dominated by traditional HPC (vector) architectures Crayvector,41% MasParSIMD, 11% Convex/HPvector, 5% Fujitsu Wind Tunnel is # , with 170 GFLOPS

Then, commodity displaced special purpose ASCI Red, Sandia 1997, 1 TFLOPS 9, MHz Intel Pentium Pro Upgraded to Pentium II Xeon, 1999, 3.1 TFLOPS ASCI White, LLNL 2001, 7.3 TFLOPS 8, MHz, IBM Power 3 Transition from Vector parallelism to Message-Passing Programming Models

11 The killer microprocessors Microprocessors killed the Vector supercomputers They were not faster but they were significantly cheaper (and more energy efficient) Needed 10 microprocessors to achieve the performance of 1 Vector CPU – but this was still a win! Cray-1, Cray-C90 NEC SX4, SX5 Alpha AV4, EV5 Intel Pentium IBM P2SC HP PA MFLOPS

The killer mobile processors TM Microprocessors killed the Vector supercomputers They were not faster but they were significantly cheaper and greener History may be about to repeat itself … Mobile processors are not faster … … but they are significantly cheaper (and greener?) Alpha Intel AMD NVIDIA Tegra Samsung Exynos 4-core ARMv8 1.5 GHz MFLOPS

Mont-Blanc roadmap to Exascale The goal is for Europe to become a serious force in the HPC Exascale race Mont-Blanc NG system by 2017, 200 PFLOPS on 10 Mwatt Mont-Blanc EX system by ~2020, 1 EFLOPS on 20 MWatt Integrated ARM + Accelerator 200 PFLOPS 10 MWatt ARM Multicore + Discrete GPU ARM Multicore 1 EFLOPS 20 MWatt FLOPS GFLOPS / Watt

Tibidabo: The first ARM HPC multicore cluster Proof of concept It is possible to deploy a cluster of smartphone processors Enable software stack development Q7 carrier board 2 x Cortex-A9 2 GFLOPS 1 GbE MbE 7 Watts 0.3 GFLOPS / W Q7 Tegra 2 2 x 1GHz 2 GFLOPS 5 Watts (?) 0.4 GFLOPS / W 1U Rackable blade 8 nodes 16 GFLOPS 65 Watts 0.25 GFLOPS / W 2 Racks 32 blade containers 256 nodes 512 cores 9x 48-port 1GbE switch 512 GFLOPS 3.4 Kwatt 0.15 GFLOPS / W

Samsung-based Mont Blanc Daughter Card Compute card based on the Samsung Exynos 5 Dual mobile phone chip Each daughter card is a full HPC compute node (CPU + GPU) + DRAM + NAND storage + Ethernet NIC

: Prototype architecture Carrier blade 15 x Compute cards 485 GFLOPS 1 GbE to 10 GbE 200 Watts (?) 2.4 GFLOPS / W Exynos 5 Compute card 1x Samsung Exynos 5 Dual 2 x 1.7GHz 1 x Mali T604 GPU GFLOPS (peak) ~10 Watts 3.2 GFLOPS / W (peak) 7U blade chassis 9 x Carrier blade 135 x Compute cards 4.3 TFLOPS 2 KWatt 2.2 GFLOPS / W 1 Rack 4 x blade cabinets 36 blades 540 compute cards 2x 36-port 10GbE switch 8-port 40GbE uplink 17.2 TFLOPS (peak) 8.2 KWatt 2.1 GFLOPS / W (peak) 80 Gb/s

Conclusions The bigger the science, the larger the supporting infrastructure! Our HPC systems have grown to tens of MWs to process all the data we are generating The processors developed for the mobile device space might yield a significant jump in energy efficiency for computation The Mont Blanc project is exploring Europe's strengths in mobile technology to exploit the opportunity arising in energy-dominated HPC