CSCI-455/552 Introduction to High Performance Computing Lecture 6.

Slides:

Advertisements

Similar presentations

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 25.

Copyright © 2014, 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Eighth Edition by Tony Gaddis,

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 11.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.

4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.

Potential for parallel computers/parallel programming

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 26.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.

MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,

1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Algorithms and Applications

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.

COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.

Parallel System Performance CS 524 – High-Performance Computing.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

CSCI-455/552 Introduction to High Performance Computing Lecture 22.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.

Parallel Programming in C with MPI and OpenMP

Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.

CSCI-455/552 Introduction to High Performance Computing Lecture 18.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.

1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.

2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

CSCI-455/552 Introduction to High Performance Computing Lecture 19.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 13.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/522 Introduction to High Performance Computing Lecture 1.

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/522 Introduction to High Performance Computing Lecture 4.

CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 9.

CSCI-455/552 Introduction to High Performance Computing Lecture 23.

2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()

CSCI-455/552 Introduction to High Performance Computing Lecture 21.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

CSCI-455/552 Introduction to High Performance Computing Lecture 15.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Message-Passing Computing

COMP60621 Fundamentals of Parallel and Distributed Systems

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Matrix Addition and Multiplication

Introduction to High Performance Computing Lecture 16

Introduction to High Performance Computing Lecture 17

Potential for parallel computers/parallel programming

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

CSCI-455/552 Introduction to High Performance Computing Lecture 6

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 2.2 Evaluating Parallel Programs

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 2.3 Sequential execution time, t s : Estimate by counting computational steps of best sequential algorithm. Parallel execution time, t p : In addition to number of computational steps, t comp, need to estimate communication overhead, t comm : t p = t comp + t comm

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 2.4 Computational Time Count number of computational steps. When more than one process executed simultaneously, count computational steps of most complex process. Generally, function of n and p, i.e. t comp = f (n, p) Often break down computation time into parts. Then t comp = t comp1 + t comp2 + t comp3 + … Analysis usually done assuming that all processors are same and operating at same speed.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 2.5 Communication Time Many factors, including network structure and network contention. As a first approximation, use t comm = t startup + nt data t startup is startup time, or message latency, essentially time to send a message with no data. Assumed to be constant. t data is transmission time to send one data word, also assumed constant, and there are n data words.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 2.6 Final communication time, t comm, the summation of communication times of all sequential messages from a process, i.e. t comm = t comm1 + t comm2 + t comm3 + … Typically, communication patterns of all processes same and assumed to take place together so that only one process need be considered. Both startup and data transmission times, t startup and t data, measured in units of one computational step, so that can add t comp and t comm together to obtain parallel execution time, t p.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 2.7 Idealized Communication Time

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved. 2.8 Benchmark Factors With t s, t comp, and t comm, can establish speedup factor and computation/communication ratio for a particular algorithm/implementation: Both functions of number of processors, p, and number of data elements, n. Will give indication of scalability of parallel solution with increasing number of processors and problem size. Computation/communication ratio will highlight effect of communication with increasing problem size and system size.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Debugging and Evaluating Parallel Programs Empirically Visualization Tools Programs can be watched as they are executed in a space-time diagram (or process-time diagram):

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Implementations of visualization tools are available for MPI. An example is the Upshot program visualization system.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, © 2004 Pearson Education Inc. All rights reserved Evaluating Programs Empirically Measuring Execution Time To measure the execution time between point L1 and point L2 in the code, we might have a construction such as. L1: time(&t1);/* start timer */. L2: time(&t2);/* stop timer */. elapsed_time = difftime(t2, t1); /* elapsed_time = t2 - t1 */ printf(“Elapsed time = %5.2f seconds”, elapsed_time); MPI provides the routine MPI_Wtime() for returning time (in seconds).