Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Slides:



Advertisements
Similar presentations
Parallel Computing Glib Dmytriiev
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Distributed Systems CS
SE-292 High Performance Computing
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Parallel Computing.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Parallel Computing Presented by Justin Reschke
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
USEIMPROVEEVANGELIZE High-Performance Computing and OpenSolaris ● Silveira Neto ● Sun Campus Ambassador ● Federal University of Ceará ● ParGO - Paralellism,
Concurrent and Distributed Programming Lecture 1 Introduction References: Slides by Mark Silberstein, 2011 “Intro to parallel computing” by Blaise Barney.
HP-SEE Hybrid MPI+OpenMP Programming HPC Summer Training, Athens, July 2011 Antun Balaz Institute of Physics Belgrade - IPB antun at.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
These slides are based on the book:
Introduction to Parallel Processing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
18-447: Computer Architecture Lecture 30B: Multiprocessors
PARALLEL COMPUTING Submitted By : P. Nagalakshmi
PARALLEL COMPUTING.
Introduction to parallel programming
CS5102 High Performance Computer Systems Thread-Level Parallelism
Distributed Processors
Parallel Processing - introduction
Parallel Programming By J. H. Wang May 2, 2017.
CS 147 – Parallel Processing
Introduction to Parallelism.
Architecture Background
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
EE 193: Parallel Computing
Multi-Processing in High Performance Computer Architecture:
What is Parallel and Distributed computing?
Parallel and Multiprocessor Architectures – Shared Memory
Chapter 17 Parallel Processing
Multiprocessors - Flynn’s taxonomy (1966)
Introduction to Multiprocessors
CSE8380 Parallel and Distributed Processing Presentation
AN INTRODUCTION ON PARALLEL PROCESSING
Distributed Systems CS
Distributed Systems CS
Hybrid Programming with OpenMP and MPI
Part 2: Parallel Models (I)
By Brandon, Ben, and Lee Parallel Computing.
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Chapter 4 Multiprocessors
Introduction, background, jargon
Chapter 01: Introduction
Presentation transcript:

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming Luka Filipovic UoM/MREN lukaf@ac.me The HP-SEE initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 261499

Agenda Parallel computing Terminology : HPC, performance, speedup Parallel computing and memory models Programming models and Flynn’s taxonomy MPI vs OpenMP Hybrid programming Library stack Hybrid programming in practice Examples and exercises 2

Overview : Parallel Computing Traditionally, software has been written for serial computation: To be run on a single computer having a single Central Processing Unit (CPU); A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time. HPC training for application developers – Podgorica, 18 November 2011 3

What is parallel computing? Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster The process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination [from Wikipedia] HPC training for application developers – Podgorica, 18 November 2011 4

What is High Performance Computing (HPC)? High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. [from Wikipedia] It involves not only hardware, but software and people as well! HPC includes a collection of powerful: hardware systems software tools programming languages parallel programming paradigms which make previously unfeasible calculations possible HPC training for application developers – Podgorica, 18 November 2011 5

Measures of performance How fast can I crunch numbers on my CPU? How much data can I store? How fast can I move the data around? from CPUs to memory; from CPUs to disk; from CPUs to/on different machines among computers: networks default (commodity): 1 Gb/s custom (high speed): 10Gb/s, 20 Gb/s and now 40Gb/s within the computer: CPU – Memory: thousands of Mb/s: 10 - 100 Gb/s CPU - Disks: MByte/s: 50 ~ 100 MB/s up to 1000 MB/s HPC training for application developers – Podgorica, 18 November 2011 6

Parallel performance The speedup of a parallel application is Speedup(p) = Time(1) / Time(p) where: Time(1) = execution time for a single processor Time(p) = execution time using p parallel processors If Speedup(p) = p, we have a perfect speedup (also called linear scaling) Speedup compares performance of an application with itself on one and on p processors More useful to compare: The execution time of the best serial application on 1 processor vs. The execution time of the best parallel algorithm on p processors HPC training for application developers – Podgorica, 18 November 2011 7

Speedup 8

Superlinear speedup? Can we find superlinear speedup, i.e. Speedup(p) > p Yes, we can: Choosing a bad “baseline” for T(1) Old serial code has not been updated with optimizations Shrinking the problem size per processor May allow it to fit in small fast memory (cache) Total time decreased because memory optimization tricks can be played. HPC training for application developers – Podgorica, 18 November 2011 9

Limits to speedup All parallel programs contain: Parallel sections Serial sections Serial sections limit the speed-up: Lack of perfect parallelism in the application or algorithm Imperfect load balancing (some processors have more work) Cost of communication Cost of contention for resources, e.g., memory bus, I/O Synchronization time Understanding why an application is not scaling linearly will help finding ways improving the applications performance on parallel computers. HPC training for application developers – Podgorica, 18 November 2011 10

Amdahl’s law (1) Let S be the fraction in an application representing the work done serially Then, 1-S = P is fraction done in parallel What is the maximum speedup for N processors? Even if the parallel part scales perfectly, we may be limited by the sequential portion of the code! HPC training for application developers – Podgorica, 18 November 2011 11

Amdahl’s law (2) The presence of a serial part of the code is quite limiting in practice: Amdahl’s Law is relevant only if serial fraction is independent of the problem size HPC training for application developers – Podgorica, 18 November 2011 12

Effective parallel performance HPC training for application developers – Podgorica, 18 November 2011 13

How to run applications faster ? There are 3 ways to improve performance: Work Harder Work Smarter Get Help Analogy in computer science Use faster hardware Optimize algorithms and techniques used to solve computational tasks Use multiple computers to solve a particular task All 3 strategies can be used simultaneously! HPC training for application developers – Podgorica, 18 November 2011 14

High performance problem: picture from http://www.f1nutter.co.uk/tech/pitstop.php

Analysis of a parallel solution Functional decomposition Different people executing different tasks Domain decomposition Different people executing the same tasks HPC training for application developers – Podgorica, 18 November 2011 16

HPC parallel computers The simplest and most useful way to classify modern parallel computers is by their memory model. How CPUs view and can access the available memory? Shared memory Distributed memory

Shared vs. Distributed Distributed Memory: Shared Memory Each processor has its own local memory. Must do message passing to exchange data between processors. Multi-computers Shared Memory Single address space. All processors have access to a pool of shared memory. Multi-processors Advanced Regional Workshop in High Performance and Grid Computing, IPM, Tehran, Iran

Shared Memory: UMA vs. NUMA Uniform memory access (UMA): Each processor has uniform access to memory. Also known as symmetric multiprocessors (SMP). Non-uniform memory access (NUMA): Time for memory access depends on location of data. Local access is faster than non-local access. HPC training for application developers – Podgorica, 18 November 2011 19

Clusters: distributed memory Independent machines combined into a unified system through software and networking HPC training for application developers – Podgorica, 18 November 2011 20

Hybrid architecture All modern clusters have hybrid architecture Many-core CPUs make each node a small SMP system HPC training for application developers – Podgorica, 18 November 2011 21