E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS1 Enzo Papandrea COMPUTING HW REQUIREMENT.

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface Portable Parallel Programs.

Advertisements

Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.

Introduction to Openmp & openACC

1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001.

Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters Rinku Gupta Dell Computers Dhabaleswar Panda.

Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.

1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one.

Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.

MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,

Reference: Message Passing Fundamentals.

Introduction CS 524 – High-Performance Computing.

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.

CS 240A: Models of parallel programming: Distributed memory and MPI.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Message Passing Interface (MPI) Part I NPACI Parallel.

Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)

Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.

E.Papandrea PM3 - Paris, 2 nd Mar 2004 DFCI COMPUTING PERFORMANCEPage 1 Enzo Papandrea COMPUTING PERFORMANCE.

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.

Parallel Programming with Java

Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

Introduction to Parallel Programming at MCSR Presentation at Delta State University January 17, 2007 Jason Hale.

Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.

1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.

An Introduction to Parallel Programming and MPICH Nikolaos Hatzopoulos.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010.

Introduction to Parallel Programming with C and MPI at MCSR Part 1 MCSR Unix Camp.

Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Collectives 1 of 10 MPI Collective Communication Kadin Tseng Scientific Computing and Visualization.

Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)

Hybrid MPI and OpenMP Parallel Programming

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

MPI Communications Point to Point Collective Communication Data Packaging.

Plan: I. Introduction: Programming Model II. Basic MPI Command III. Examples IV. Collective Communications V. More on Communication modes VI. References.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.

MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.

Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.

MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.

Parallel Programming with MPI By, Santosh K Jena..

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.

12.1 Parallel Programming Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

NORA/Clusters AMANO, Hideharu Textbook pp. １４０－１４７.

Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.

2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()

April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.

Page 1 COST 723 – Opening Workshop - ESTEC March 2004 GEO-MTR: A 2-Dimensional Multi Target Retrieval System for MIPAS/ENVISAT observations Bianca.

Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN

MPI Derived Data Types and Collective Communication

Message Passing Interface Using resources from

MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.

COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.

MPI: Message Passing Interface An Introduction S. Lakshmivarahan School of Computer Science.

Introduction to MPI.

MPI Message Passing Interface

Parallel Programming with MPI and OpenMP

More on MPI Nonblocking point-to-point routines Deadlock

Introduction to parallelism and the Message Passing Interface

More on MPI Nonblocking point-to-point routines Deadlock

Hybrid MPI and OpenMP Parallel Programming

MPI Message Passing Interface

Programming Parallel Computers

Presentation transcript:

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS1 Enzo Papandrea COMPUTING HW REQUIREMENT

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS2 GEOFIT - MTR  With Geofit measurements from a full orbit are simultaneously processed  A Geofit where P, T and VMR of H 2 O and O 3 are simultaneously retrieved increase the computing time

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS3 TIME OF SIMULATIONS Computing Time: sequential algorithm  We made some simulations with an Alphas. ES45, CPU 1 GHz H 2 OT S = 1h 30m (T S = T SEQUENTIAL ) O 3 T S = 4h 40m PTT S = 9h 48m MTRT S = 10h 30m …to reduce the time of the simulations we propose a parallel system

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS4 PARALLELIZATION  The first step will be to parallelize the loop that computes the forward model because: 1.It is the most time consuming part of the code. 2.The computation of the forward model for one sequence is independent from the computation of another sequence so that processors have to communicate data only at the beginning and at the end of the forward model.

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS5 PARALLEL TIME  Parallel time (T P ) is the sequential time divided the number of CPUs  Example, for a system with 8 CPUs if the algorithm is completely parallel: T P = T S /8 = 12.5% of sequential time This is the best improvement we can reach with 8 CPUs

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS6 FORWARD MODEL PARALL. T Forward model (3 iterations): 45m Sum of the times to compute the forward model T P = T Forward model /#CPU = 45m/8 = 6m Time of parallelized code T = T S + T P = (1h 30m - 45m) + 6m = 51m = 56% Total time (sum of the time of code remained sequential and time of code parallelized) H2OH2O  If we parallelize only the forward model we can do an evaluation of the simulations time with 8 CPUs:

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS7 FW MODEL PARALL./1 PT T Forward model (2 it): 10h 30m, T P = 1h 11m T = 2h 11m = 20% MTR T Forward model (2 it): 9h 33m, T P = 1h 12m T = 1h 26m = 15% T Forward model (2 it): 4h 10m, T P = 30m T = 60m = 21% O3O3

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS8 MEMORY CLASSIFICATION M M M M P P P P NETWORK Local Memory Each processor (P) can see the whole memory (M) Each processor can see only its memory: to exchange data we need a network M P P P P Shared Memory  In order to use a parallel code we need an appropriate hardware witch can be classified by memory:

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS9 OPEN-MP VS MPI  With systems Shared Memory is used OpenMP + compiler directives  With systems Local Memory is used MPI + call to libraries The header file mpif.h contains definitions of MPI constants, MPI types and functions Parallelism is not visible to the programmer (compiler responsible for parallelism) Easy to do Small improvements in performance Parallelism is visible to the programmer Difficult to do Large improvements in performance

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS10 OPEN-MP EXAMPLE PROGRAM Matrix IMPLICIT NONE INTEGER (KIND=4) :: i, j INTEGER (KIND=4), parameter :: n = 1000 INTEGER (KIND=4) :: a(n,n) !$ OMP PARALLEL DO & !$ PRIVATE(i,j) & !$ SHARED(a) DO j = 1, n DO i = 1, n a(i,j) = i + j ENDDO !$ OMP END PARALLEL DO END f90 –omp name_program setenv OMP_NUM_THREADS 2 f90 name_program If we compile in this way the compiler will treat the instructions beginning with !$ like comments If we compile with –omp flag the compiler will read these instructions

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS11 MPI EXAMPLE  POINT TO POINT COMMUNICATION: SEND and RECEIVE MPI_SEND(buf, count, type, dest, tag, comm, ierr) MPI_RECV(buf, count, type, dest, tag, status, comm, ierr) BUFarray of type type COUNTnumber of elements of buf to be sent TYPE MPI type of buf DESTrank of the destination process TAGnumber identifying the message COMMcommunicator of the sender and receiver STATUS array containing communication status IERRerror code (if ierr = 0 no error occurs)

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS12 MPI EXAMPLE/1 BROADCAST (ONE TO ALL COMMUNICATION): SAME DATA SENT FROM ROOT PROCESS TO ALL OTHERS IN THE COMMUNICATOR PROCESSES A0A0 DATA A0A0 PROCESSES A0A0 A0A0 A0A0 DATA

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS13 MPI COMMINICATOR  IN MPI IT IS POSSIBLE TO DIVIDE THE TOTAL NUMBER OF PROCESSES INTO GROUPS, CALLED COMMUNICATORS  THE COMMUNICATOR THAT INCLUDES ALL PROCESSES IS CALLED MPI_COMM_WORLD

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS14 BROADCAST EXAMPLE PROGRAM Broadcast IMPLICIT NONE INCLUDE 'mpif.h' REAL (KIND=4) :: buffer INTEGER (KIND=4) :: err, rank, size CALL MPI_INIT(err) CALL MPI_COMM_RANK(MPI_WORLD_COMM, rank, err) CALL MPI_COMM_SIZE(MPI_WORLD_COMM, size, err) if(rank.eq. 5) buffer = 24. call MPI_BCAST(buffer, 1, MPI_REAL, 5, MPI_COMM_WORLD, err) print *, "P:", rank," after broadcast buffer is ", buffer CALL MPI_FINALIZE(err) END P:1 after broadcast buffer is 24. P:3 after broadcast buffer is 24. P:4 after broadcast buffer is 24. P:0 after broadcast buffer is 24. P:5 after broadcast buffer is 24. P:6 after broadcast buffer is 24. P:7 after broadcast buffer is 24. P:2 after broadcast buffer is 24. Proc. 5 sends its real variable buffer to the processes in the comm. MPI_COMM_WORLD

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS15 OTHER COLLECTIVE COMMUNICATIONS ALLGATHER: DIFFERENT DATA SENT FROM DIFFERENT PROCESSES TO ALL OTHER IN THE COMMUNICATOR SCATTER: DIFFERENT DATA SENT FROM ROOT PROCESS TO ALL OTHER IN THE COMMUNICATOR GATHER: THE OPPOSITE OF SCATTER D0D0 PROCESSES C0C0 B0B0 A0A0 DATA D0D0 C0C0 B0B0 A0A0 PROCESSES D0D0 C0C0 B0B0 A0A0 D0D0 C0C0 B0B0 A0A0 D0D0 C0C0 B0B0 A0A0 DATA PROCESSES A3A3 A2A2 A1A1 A0A0 DATA A3A3 PROCESSES A2A2 A1A1 A0A0 DATA

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS16 LINUX CLUSTER  We have a linux cluster with 8 nodes, each node: CPU Intel P4, 2.8Ghz, Front Side Bus 800Mhz 2 Gbyte RAM 333Mhz Hard Disk 40 Gbyte 1 Switch LAN (Network)

E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS17 CONCLUSIONS  Alphas. with 2 CPUs Shared Memory:  Linux cluster (Local memory): Very expensive (~ ,00 €) Limitated #CPU Cheap (~900,00 €/node) Illimitated #CPU In the past only arch. 32 bits 2 (32-1) = 2 Gbyte = 2 · 2 30 bytes Now architecture 64 bits! 2 (64-1) = 8 Exabyte = 8 · 2 60 bytes For readability and simplicity of the code we would like to use Fortran 90