HPC Trends and Challenges in Climate Research Prof. Dr. Thomas Ludwig German Climate Computing Centre & University of Hamburg Hamburg, Germany
2 / 42 © DKRZ | more FLOPS, more bytes more science ?! more knowledge and insight ?! more money !!! HP-CAST 18, Hamburg factor 1000 ! factor 100 ? factor 50 ? factor 10 ? A First Cost-Benefit Analysis
3 / 42 © DKRZ | What´s going on in HPC ? HP-CAST 18, Hamburg
4 / 42 © DKRZ | Evolution Curve HP-CAST 18, Hamburg factor 1000 of performance increase every 11 years
5 / 42 © DKRZ | HP-CAST 18, Hamburg MFlops 1KFlops 1Flops 0,001Flops 1GFlops 1TFlops 1PFlops factor 1000 of performance increase every 11 years
6 / 42 © DKRZ | HP-CAST 18, Hamburg MFlops 1KFlops 1Flops 0,001Flops 1GFlops Zuse´s Z3 – 0.3 Flops (1941) 1TFlops 1PFlops
7 / 42 © DKRZ | HP-CAST 18, Hamburg MFlops 1KFlops 1Flops 0,001Flops 1GFlops Zuse´s Z3 – 0.3 Flops (1941) 1TFlops 1PFlops factor 1000 of performance increase every 11 years
8 / 42 © DKRZ | HP-CAST 18, Hamburg MFlops 1KFlops 1Flops 0,001Flops 1GFlops Zuse´s Z3 – 0.3 Flops (1941) 1TFlops 1PFlops 1EFlops 1ZFlops 2030 (?) Hochleistungsrechnen - © Thomas Ludwig factor 1000 of performance increase every 11 years
9 / 42 © DKRZ | What´s going on in climate research ? HP-CAST 18, Hamburg
10 / 42 © DKRZ | The Big Bang of Climate Science HP-CAST 18, Hamburg Picture stolen from Rory Kelly (NCAR) 1956
11 / 42 © DKRZ | GCRM – Global Cloud Resolving Model Finally Cloud Computing HP-CAST 18, Hamburg
12 / 42 © DKRZ | DKRZ today and tomorrow HP-CAST 18, Hamburg
13 / 42 © DKRZ | DKRZ – to provide high performance computing platforms, sophisticated and high capacity data management, and superior service for premium climate science. Mission HP-CAST 18, Hamburg
14 / 42 © DKRZ | IBM Power6, installed 2009 Peak performance: 158 TFLOPS, Linpack 115 TFLOPS Rank 98 in Nov. 2011´s TOP500 list 264 IBM Power6 compute nodes with processor cores More than 26 TB main memory 6+ PB disk space air cond. compute system disk drives air cond. Compute System HP-CAST 18, Hamburg
15 / 42 © DKRZ | 7 Sun StorageTek SL8500 tape libraries slots for tapes 78 tape drives 100+ Petabyte total capacity Data Archive HP-CAST 18, Hamburg
16 / 42 © DKRZ | HP ProLiant DL370 G6 HP XW 9400, HP ProLiant DL585 HP Remote Graphics 10 GE connection to the GPFS disk storage Visualization System HP-CAST 18, Hamburg
17 / 42 © DKRZ | Present, Future, and Exa HP-CAST 18, Hamburg DKRZ 2012 Linpack110 TFLOPS Main memory26 TB Disk space7 PB Tape library100 PB Memory-to-disk30 GB/s Disk-to-tape3 GB/s Application-to-disk too slow
18 / 42 © DKRZ | Present, Future, and Exa HP-CAST 18, Hamburg DKRZ 2012x Linpack110 TFLOPS1 EFLOPS Main memory26 TB260 PB Disk space7 PB70 EB Tape library100 PB1 ZB Memory-to-disk30 GB/s300 TB/s Disk-to-tape3 GB/s30 TB/s Application-to-disk too slow
19 / 42 © DKRZ | Present, Future, and Exa HP-CAST 18, Hamburg DKRZ 2012x Exa Linpack110 TFLOPS1 EFLOPS Main memory26 TB260 PB32-64 PB Disk space7 PB70 EB0.5-1 EB Tape library100 PB1 ZB? Memory-to-disk30 GB/s300 TB/s50 GB/s Disk-to-tape3 GB/s30 TB/s? Application-to-disk too slow
20 / 42 © DKRZ | A “regular” Exascale system as proposed in the documents will not fit typical memory and I/O intensive climate applications Currently, even Petascale systems are not a success story for climate research Observations HP-CAST 18, Hamburg
21 / 42 © DKRZ | Is there a revolution in HPC ? HP-CAST 18, Hamburg
22 / 42 © DKRZ | HP-CAST 18, Hamburg Exascale
23 / 42 © DKRZ | From GFLOPS to TFLOPS was a revolution – From vector machines to massively parallel systems with message passing – Required new algorithms and software From TFLOPS to PFLOPS was more of an evolution – MPI/OpenMP/C/C++/Fortran: just more of it From PFLOPS to EFLOPS will again be a revolution – Clock rate is constant – number of cores increases – Energy efficiency with special accelerator hardware – Resilience will be a big problem –... Exascale Revolution HP-CAST 18, Hamburg
24 / 42 © DKRZ | Hardware challenge – Billions of components The K computer has already around 800 racks ! – Energy consumption – Size of the system memory – System resilience Co-design challenge – Simple rule-of-thumb design choices must be replaced by concepts of co-design to balance design parameters – Optimize trade-offs among applications, software and hardware Exascale Challenges HP-CAST 18, Hamburg
25 / 42 © DKRZ | The applied mathematics challenge – It is not enough to just make the grid finer – Reproducibility of results is also an issue The algorithmic challenge – New multicore-friendly algorithms – Adaptive load balancing – Multiple precision software – Auto-tuning – Integrated fault-tolerance – Energy efficient algorithms Exascale Challenges HP-CAST 18, Hamburg
26 / 42 © DKRZ | The computer science challenge – Programming models – I/O – Tools for debugging and performance analysis The educational challenge – Who will be able to successfully work on these issues? – Who will teach these people? – How many teachers? Exascale Challenges HP-CAST 18, Hamburg
27 / 42 © DKRZ | Good news: We have time until 2018/19 Bad news: We need many more people ! Exascale Observations HP-CAST 18, Hamburg
28 / 42 © DKRZ | Is there a revolution in climate research ? HP-CAST 18, Hamburg
29 / 42 © DKRZ | HP-CAST 18, Hamburg Big Data Wrong fortress, folks!!
30 / 42 © DKRZ | Refer to Jim Gray et al.: “The Fourth Paradigm – Data-Intensive Scientific Discovery” „Data are not consumed by the ideas and innovations they spark but are an endless fuel for creativity“ „Harnessing the Power of Digital Data for Science and Society“ by the Interagency Working Group on Digital Data January 2009 The Fourth Paradigm HP-CAST 18, Hamburg
31 / 42 © DKRZ | Climate research was and is dependent on data – Keep data for long term in order to validate models in the future – Keep all model variables because we do not know what will be necessary in future analyses – Use a fine global grid in order to select later local regions –... DKRZ will archive in 2012 around 8 PByte on tape – This is all output data of simulation runs Climate Research and Data HP-CAST 18, Hamburg
32 / 42 © DKRZ | The 5th coupled model intercomparison project – Provides key input for the next IPCC report – ~20 modeling centers around the world (DKRZ being one of the biggest) – Produces 10s of PBytes of output data from ~60 experiments (“digital born data”) Data are produced without knowing all applications beforehand and these data are projected for interdisciplinary utilization by yet unknown researchers Example: CMIP HP-CAST 18, Hamburg
33 / 42 © DKRZ | Data creation – Model runs in HPC centers (little I big O) Data evaluation – Processing, visualization, sharing, quality control Data archival – Documentation, long term preservation Data dissemination – GUI based catalogue and data access – API based data access – Data federations Data Life Cycle Management HP-CAST 18, Hamburg
34 / 42 © DKRZ | Data creation – German contribution produced at DKRZ – Used ½ year of its compute capacity Data archival – 600 TB raw + 60 TB with quality control Data dissemination – Will be heavily used during the next 5 years Expected CMIP6 data volume: factor higher Example Workflow CMIP HP-CAST 18, Hamburg
35 / 42 © DKRZ | Bad news: We need more people ! Very bad news: We need solutions NOW ! Big Data Observations HP-CAST 18, Hamburg
36 / 42 © DKRZ | From coevolution to corevolution ? HP-CAST 18, Hamburg
37 / 42 © DKRZ | Wikipedia on technical coevolution: “Computer software and hardware can be considered as two separate components but tied intrinsically by coevolution” E.g. numerical weather prediction (NWP) – Started in 1950 – Hardware: ENIAC (385 FLOPS) – Brainware: John von Neumann (appl. maths) E.g. climate models – Started in 1956 by Norman Philips Coevolution HP-CAST 18, Hamburg
38 / 42 © DKRZ | HP-CAST 18, Hamburg Corevolution Data intensive pre-Exascale HPC Wikipedia on technical corevolution: DKRZ will join the army computer scientistsclimate researchers nothing
39 / 42 © DKRZ | Come to see us at ISC´12 booth #140 and learn more about Data life cycle management Massive data I/O DKRZ´s Role HP-CAST 18, Hamburg
40 / 42 © DKRZ | Observations and Conclusions HP-CAST 18, Hamburg
41 / 42 © DKRZ | We need co-design, coevolution, corevolution Future sustained development will heavily depend on an intensified cooperation between hardware, software and application experts Access to human resources will limit our growth Co-Whatsoever HP-CAST 18, Hamburg
42 / 42 © DKRZ | Richardson´s vision in 1916 Perhaps some day in the dim future it will be possible to advance computations faster than the weather advances and at a cost less than the saving to mankind due to the information gained (Weather Prediction by Numerical Process) A Final Cost-Benefit Analysis HP-CAST 18, Hamburg