1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001.

Slides:

Advertisements

Similar presentations

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.

Advertisements

NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.

Parallel Processing with OpenMP

Introduction to Openmp & openACC

Distributed Systems CS

Introductions to Parallel Programming Using OpenMP

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.

History of Distributed Systems Joseph Cordina

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.

Parallel Computing Overview CS 524 – High-Performance Computing.

Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.

Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.

Experiencing Cluster Computing Class 1. Introduction to Parallelism.

Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS1 Enzo Papandrea COMPUTING HW REQUIREMENT.

Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.

Computer Architecture Parallel Processing

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.

Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.

Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab

Parallel Programming in Java with Shared Memory Directives.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.

OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Introduction to Charm++ Machine Layer Gengbin Zheng Parallel Programming Lab 4/3/2002.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

Hybrid MPI and OpenMP Parallel Programming

Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.

CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.

Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.

Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

From Clustered SMPs to Clustered NUMA John M. Levesque The Advanced Computing Technology Center.

NUMA Control for Hybrid Applications Kent Milfeld TACC May 5, 2015.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Computer Engg, IIT(BHU)

Introduction to OpenMP

More on MPI Nonblocking point-to-point routines Deadlock

September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.

September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.

Hybrid Programming with OpenMP and MPI

More on MPI Nonblocking point-to-point routines Deadlock

Parallel Computing Explained How to Parallelize a Code

CINECA HIGH PERFORMANCE COMPUTING SYSTEM

Types of Parallel Computers

Shared-Memory Paradigm & OpenMP

Programming Parallel Computers

Presentation transcript:

1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001

2 High Performance Computing for CM 1)Overview 2)Alternative Architectures 3)Message Passing 4)“Shared Memory” 5)Case Study Agenda:

3 1)High Performance Computing - Overview

4 * Understanding HPC concepts * Why should programmers care about the architecture? * Do compilers make the right choices? * Nowadays, there are alternatives Some Important Points

5 Trends in computer development *Speed of calculation is steadily increasing *Memory may not be in balance with high calculation speeds *Workstations are approaching speeds of especially efficient designs *Are we approaching the limit of the speed of light? * To get an answer faster, we must perform calculations in parallel

6 Some HPC concepts * HPC * HPF / Fortran90 * cc-NUMA * Compiler directives * OpenMP * Message passing * PVM/MPI * Beowulf

7

8

9

10

11

12

13

14 2) Alternative Architectures

15 Source: IDC, 2001

16 Source: IDC, 2001

17

18 IUCC (Machba) computers Cray J cpu Memory - 4 GB (500 MW) Origin cpu (R12000, 400 MHz) 28.7 GB total memory PC cluster 64 cpu (Pentium III, 550 MHz) Total memory - 9 GB Mar 2001

19 Chris Hempel, hpc.utexas.edu

20

21 Chris Hempel, hpc.utexas.edu

22 CPU Memory CPU Symmetric Multiple Processors Examples: SGI Power Challenge, Cray J90/T90 Memory Bus

23 Memory CPU Memory CPU Memory CPU Memory CPU Distributed Parallel Computing Examples: SP2, Beowulf

24

25

26

27 3) Message Passing

28 call MPI_SEND(sum,1,MPI_REAL,ito,itag, MPI_COMM_WORLD,ierror) call MPI_RECV(sum,1,MPI_REAL,ifrom,itag, MPI_COMM_WORLD,istatus,ierror) MPI commands -- examples

29 Some basic MPI functions Setup: mpi_init mpi_finalize Environment: mpi_comm_size mpi_comm_rank Communication: mpi_send mpi_receive Synchronization: mpi_barrier

30 Other important MPI functions Asynchronous communication: mpi_isend mpi_irecv mpi_iprobe mpi_wait/nowait Collective communication: mpi_barrier mpi_bcast mpi_gather mpi_scatter mpi_reduce mpi_allreduce Derived data types: mpi_type_contiguous mpi_type_vector mpi_type_indexed mpi_type_pack mpi_type_commit mpi_type_free Creating communicators: mpi_comm_dup mpi_comm_split mpi_intercomm_create mpi_comm_free

31 4) “Shared Memory”

32 CRAY: CMIC$ DO ALL do i=1,n a(i)=i enddo SGI:C$DOACROSS do i=1,n a(i)=i enddo OpenMP: C$OMP parallel do do i=1,n a(i)=i enddo Fortran directives --examples

33 OpenMP Summary OpenMP standard – first published Oct 1997 Directives Run-time Library Routines Environment Variables Versions for f77, f90, c, c++

34 OpenMP Summary Parallel Do Directive c$omp parallel do private(I) shared(a) c$omp end parallel do  optional do I=1,n a(I)= I+1 enddo

35 OpenMP Summary Defining a Parallel Region - Individual Do Loops c$omp parallel shared(a,b) do j=1,n a(j)=j enddo do k=1,n b(k)=k enddo c$omp do private(j) c$omp end do nowait c$omp do private(k) c$omp end do c$omp end parallel

36 OpenMP Summary Parallel Do Directive - Clauses shared private default(private|shared|none) reduction({operator|intrinsic}:var) if(scalar_logical_expression) ordered copyin(var)

37 OpenMP Summary Run-Time Library Routines Execution environment omp_set_num_threads omp_get_num_threads omp_get_max_threads omp_get_thread_num omp_get_num_procs omp_set_dynamic/omp_get_dynamic omp_set_nested/omp_get_nested

38 OpenMP Summary Run-Time Library Routines Lock routines omp_init_lock omp_destroy_lock omp_set_lock omp_unset_lock omp_test_lock

39 OpenMP Summary Environment Variables OMP_NUM_THREADS OMP_DYNAMIC OMP_NESTED

40 RISC memory levels CPU Main memory Cache Single CPU

41 RISC memory levels CPU Main memory Cache Single CPU

42 RISC memory levels Main memory Multiple CPU’s CPU Cache 1 CPU 0 1 Cache 0

43 RISC memory levels Main memory Multiple CPU’s CPU Cache 1 CPU 0 1 Cache 0

44 Main memory Multiple CPU’s CPU Cache 1 CPU 0 1 Cache 0 RISC Memory Levels

45 subroutine xmult (x1,x2,y1,y2,z1,z2,n) real x1(n),x2(n),y1(n),y2(n),z1(n),z2(n) real a,b,c,d do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo end A sample program

46 subroutine xmult (x1,x2,y1,y2,z1,z2,n) real x1(n),x2(n),y1(n),y2(n),z1(n),z2(n) real a,b,c,d c$omp parallel do do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo end A sample program

47 Run on Technion origin2000 Vector length = 1,000,000 Loop repeated 50 times Compiler optimization: low (-O1) Elapsed time, sec threads Compile No parallel Parallel Is this running in parallel? A sample program

48 Run on Technion origin2000 Vector length = 1,000,000 Loop repeated 50 times Compiler optimization: low (-O1) Elapsed time, sec threads Compile No parallel Parallel Is this running in parallel? WHY NOT? A sample program

49 c$omp parallel do do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo Is this running in parallel? WHY NOT? Answer: by default, variables a,b,c,d are defined as SHARED A sample program

50 Elapsed time, sec threads Compile No parallel Parallel Solution: define a,b,c,d as PRIVATE: c$omp parallel do private(a,b,c,d) This is now running in parallel A sample program

51 5) Case Study

52 HPC in the Technion SGI Origin cpu (R10000) MHz Total memory GB PC cluster (linux redhat 6.1) 6 cpu (pentium II - 400MHz) Memory MB/cpu

53 Fluent test case -- Stability of a subsonic turbulent jet Source: Viktoria Suponitsky Faculty of Aerospace Engineering, Technion

54

55 Reading "Case25unstead.cas" quadrilateral cells, zone 1, binary D interior faces, zone 9, binary. 50 2D wall faces, zone 3, binary D pressure-inlet faces, zone 7, binary. 50 2D pressure-outlet faces, zone 5, binary. 50 2D pressure-outlet faces, zone 6, binary. 50 2D velocity-inlet faces, zone 2, binary D axis faces, zone 4, binary nodes, binary node flags, binary. Fluent test case 10 time steps, 20 iterations per time step

56

57

58 Host spawning Node 0 on machine "parix". ID Comm. Hostname O.S. PID Mach ID HW ID Name host net parix irix Fluent Host n7 smpi parix irix Fluent Node n6 smpi parix irix Fluent Node n5 smpi parix irix Fluent Node n4 smpi parix irix Fluent Node n3 smpi parix irix Fluent Node n2 smpi parix irix Fluent Node n1 smpi parix irix Fluent Node n0* smpi parix irix Fluent Node Fluent test case SMP command: fluent 2d -t8 -psmpi -g < inp

59 Fluent test case Cluster command: fluent 2d -cnf=clinux1,clinux2,clinux3,clinux4,clinux5,clinux6 -t6 –pnet -g < inp Node 0 spawning Node 5 on machine "clinux6". ID Comm. Hostname O.S. PID Mach ID HW ID Name n5 net clinux6 linux-ia Fluent Node n4 net clinux5 linux-ia Fluent Node n3 net clinux4 linux-ia Fluent Node n2 net clinux3 linux-ia Fluent Node n1 net clinux2 linux-ia Fluent Node host net clinux1 linux-ia Fluent Host n0* net clinux1 linux-ia Fluent Node

60

61

62 TOP500 (November 2, 2000)

63 TOP500 (November 2, 2000)