Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.

Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis

2 Climate Simulation We use a computer model of the climate system – a computer program, which simulates an abstract model (mathematical representation) of the climate system – reproducing the relevant features based on – theoretical principles (e.g. laws of nature) – observed relationships

„Blizzard“ – IBM Power6 System Peak performance: 158 TeraFlop/s (158 trillion floating point operations per second) 264 IBM Power6 nodes 16 dual core CPUs per node (altogether 8,448 compute cores) more than 20 TeraByte memory 7,000 TeraByte of disk space until 2011 Infiniband network: 7.6 TeraByte/s (aggregated) High performance computing system „Blizzard“ at DKRZ - compute nodes (orange), infiniband switch (red), disks (green)

Message Passing Hybrid World Node OpenMP

5 Parallel Compiler Why can’t I just say f90 –Parallel mycode.f and everything works fine ? Logical dependencies Data dependencies

Multiprocessor – Shared Memory CPU Network Memory Module Memory Module Memory Module Memory Module CPU

7 Concepts - Shared Memory Directives Single Process Master Thread Parallel Region Team of Threads Single Process Parallel Region Master Thread Team of Threads

8 Amdahls law

Processes und Threads Message Passing OpenMP

„Blizzard“ – IBM Power6 System Peak performance: 158 TeraFlop/s (158 trillion floating point operations per second) 264 IBM Power6 nodes 16 dual core CPUs per node (altogether 8,448 compute cores) more than 20 TeraByte memory 7,000 TeraByte of disk space until 2011 Infiniband network: 7.6 TeraByte/s (aggregated) High performance computing system „Blizzard“ at DKRZ - compute nodes (orange), infiniband switch (red), disks (green)

Bottlenecks 12 Bottlenecks of Massively Parallel Computing Systems – Memory Bandwidth – Communication Network – Idle Processors

Memory Hierarchy Register L1,L2,L3 Cache Memory 13

Data Movement 14

15 Data Movement in Parallel Systems

The World of MPI Network CPU Memory Module CPU Memory Module CPU

Processes und Threads Message Passing OpenMP

Improve the efficiency of a parallel program running on High Performance Computers Typical Workflow Motivation 19 Measurement and Runtimeanalysis of the Code Development of a parallel Program Optimizing the Code

Profiling – Summarize performance data per process/thread during execution – „statistical“ Analysis Tracing – Trace record with performance data and timestamp per process/thread – e.g. MPI messages Performance Engineering 20

Optimization Compilers cannot optimize automatically everything Optimization is not just finding the right compiler flag Major algorithmic changes are necessary 21

Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.

Similar presentations

Presentation on theme: "Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.

Similar presentations

Presentation on theme: "Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis."— Presentation transcript:

Similar presentations

About project

Feedback