Slide-1 University of Maryland Five Common Defect Types in Parallel Computing Prepared for Applied Parallel Computing Prof. Alan Edelman Taiga Nakamura.

Slides:



Advertisements
Similar presentations
Lecture 3 Some commonly used C programming tricks. The system command Project No. 1: A warm-up project.
Advertisements

Chapter 5 Errors Bjarne Stroustrup
MPI Message Passing Interface Portable Parallel Programs.
MPI Message Passing Interface
MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Advanced Programming 15 Feb The “OI” Programming Process Reading the problem statement Thinking Coding + Compiling Testing + Debugging Finalizing.
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Practical techniques & Examples
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering
MPI Collective Communications
CS 140: Models of parallel programming: Distributed memory and MPI.
Chapter 10 Introduction to Arrays
CS 240A: Models of parallel programming: Distributed memory and MPI.
 Monday, 9/30/02, Slide #1 CS106 Introduction to CS1 Monday, 9/30/02  QUESTIONS (on HW02, etc.)??  Today: Libraries, program design  More on Functions!
Informationsteknologi Monday, September 10, 2007Computer Systems/Operating Systems - Class 31 Today’s class Review of more C Operating system overview.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
Chapter 1: Introduction To Computer | SCP1103 Programming Technique C | Jumail, FSKSM, UTM, 2005 | Last Updated: July 2005 Slide 1 Introduction To Computers.
Copyright 2003 Scott/Jones Publishing Brief Version of Starting Out with C++, 4th Edition Chapter 1 Introduction to Computers and Programming.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Computing Software. Programming Style Programs that are not documented internally, while they may do what is requested, can be difficult to understand.
CHAPTER 07 Arrays and Vectors (part I). OBJECTIVES 2 In this part you will learn:  To use the array data structure to represent a set of related data.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
CS 240A Models of parallel programming: Distributed memory and MPI.
Java means Coffee Java Coffee Beans The name “JAVA” was taken from a cup of coffee.
Computer Science and Software Engineering University of Wisconsin - Platteville 2. Pointer Yan Shi CS/SE2630 Lecture Notes.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Engineering H192 - Computer Programming The Ohio State University Gateway Engineering Education Coalition Lect 7P. 1Winter Quarter File I/O in C Lecture.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
CIS-165 C++ Programming I CIS-165 C++ Programming I Bergen Community College Prof. Faisal Aljamal.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
UNIX Files File organization and a few primitives.
File IO and command line input CSE 2451 Rong Shi.
CSE 232: C++ debugging in Visual Studio and emacs C++ Debugging (in Visual Studio and emacs) We’ve looked at programs from a text-based mode –Shell commands.
Overflow Examples 01/13/2012. ACKNOWLEDGEMENTS These slides where compiled from the Malware and Software Vulnerabilities class taught by Dr Cliff Zou.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
CSE 332: C++ debugging Why Debug a Program? When your program crashes –Finding out where it crashed –Examining program memory at that point When a bug.
1 Introduction to Software Testing. Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Chapter 1 2.
CSC 107 – Programming For Science. Today’s Goal  Become familiar with simple arrays  Declaring an array variable  Assigning data to array entries 
CPS4200 Unix Systems Programming Chapter 2. Programs, Processes and Threads A program is a prepared sequence of instructions to accomplish a defined task.
The Software Construction Process. Computer System Components Central Processing Unit (Microprocessor)
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
5.01 Understand Different Types of Programming Errors
Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.
1 CSC103: Introduction to Computer and Programming Lecture No 27.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
C++ Programming Lecture 14 Arrays – Part I The Hashemite University Computer Engineering Department (Adapted from the textbook slides)
Chapter 11  Getting ready to program  Hardware Model  Software Model  Programming Languages  Facts about C++  Program Development Process  The Hello-world.
Exception Handling How to handle the runtime errors.
CSE 332: C++ expressions Expressions: Operators and Operands Operators obey arity, associativity, and precedence int result = 2 * 3 + 5; // assigns 11.
Chapter Nine Strings. Char vs String Literals Size of data types: Size of data types: –sizeof(“hello\n”)7 bytes –sizeof(“hello”)6 bytes –sizeof(“X”)2.
Lesson #5 Repetition and Loops.
Lesson #5 Repetition and Loops.
Testing and Debugging.
Lesson #5 Repetition and Loops.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Paraguin Compiler Communication.
Introduction to parallelism and the Message Passing Interface
Using compiler-directed approach to create MPI code automatically
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Patterns Paraguin Compiler Version 2.1.
Lesson #5 Repetition and Loops.
Defect Patterns in High Performance Computing
Advanced OS COMP 755.
Presentation transcript:

Slide-1 University of Maryland Five Common Defect Types in Parallel Computing Prepared for Applied Parallel Computing Prof. Alan Edelman Taiga Nakamura University of Maryland

Slide-2 University of Maryland Introduction Debugging and testing parallel programs is hard –What kinds of mistakes do programmers make? –How to prevent or effectively find and fix defects? Hypothesis: Knowing about common defects will reduce time spent debugging Here: Five common defect types in parallel programming (from last year’s classes) –These examples are in C/MPI –Suspect similar defect types in UPC, CAF, F/MPI Your feedback is solicited (by both us and UMD)!

Slide-3 University of Maryland Defect 1: Language Usage Example #include int main(int argc, char**argv) { FILE *fp; MPI_Status status; status = MPI_Init(NULL, NULL); if (status != MPI_SUCCESS) { return -1; } fp = fopen(...); if (fp == NULL) { return -1; }... fclose(fp); MPI_Finalize(); return 0; } (Valid in MPI 2.0 only. In 1.1, it had to be MPI_Init(&argc, &argv);) MPI_Finalize must be called by all processors in every execution path

Slide-4 University of Maryland Use of Language Features Advanced language features are not necessarily used –Try to understand a few, basic language features thoroughly MPI keywords in Conjugate Gradient in C/C++ (15 students) 24 functions, 8 constants

Slide-5 University of Maryland Defect 1: Language Usage Erroneous use of parallel language features –E.g. inconsistent data types between send and recv, usage of memory copy functions in UPC Simple mistakes in understanding –Very common in novices Compile-time defects can be found easily –Wrong number or type of parameters, etc. Some defects surface only under specific conditions –e.g., number of processors, value of input, hardware/software environment Advice: –Check unfamiliar language features carefully

Slide-6 University of Maryland Defect 2: Space Decomposition Example: Game of Life –Loop boundaries must be changed (there are other approaches too) /* Main loop */ /* … */ for (y = 0; y < ysize; y++) { for (x = 0; x < xsize; x++) { c = count(buffer, xsize, ysize, x, y); /* update buffer... */ } MPI_Comm_size(MPI_COMM_WORLD, &np); ysize /= np; /* Main loop */ /*... */ for (y = 0; y < ysize; y++) { for (x = 0; x < xsize; x++) { c = count(buffer, xsize, ysize, x, y); /* update buffer... */ } /* MPI_Send, MPI_Recv */ xsize ysize send/recv 1ysize+1 ysize+2 ysize may not be divisible by np

Slide-7 University of Maryland Defect 2: Space Decomposition Incorrect mapping between the problem space and the program memory space Mapping in parallel version can be different from that in serial version –Array origin is different in every processor –Additional memory space for communication can complicate the mapping logic Symptoms: –Segmentation fault (if array index is out of range) –Incorrect or slightly incorrect output

Slide-8 University of Maryland Defect 3: Side-Effects of Parallelization Example: srand(time(NULL)); for (i=0; i<n; i++) { double x = rand() / (double)RAND_MAX; double y = rand() / (double)RAND_MAX; if (x*x + y*y < 1) ++k; } return k/(double)n; int np; status = MPI_Comm_size(MPI_COMM_WORLD &np);... srand(time(NULL)); for (i=0; i<n; i+=np) { double x = rand() / (double)RAND_MAX; double y = rand() / (double)RAND_MAX; if (x*x + y*y < 1) ++k; } status = MPI_Reduce(... MPI_SUM... );... return k/(double)n; Approximation of pi 1: All procs might use the same pseudo-random sequence, spoiling independence 2: Hidden serialization in rand() causes performance bottleneck

Slide-9 University of Maryland Defect 3: Side-Effect of Parallelization Example: File I/O FILE *fp = fopen (….); If (fp != NULL) { while (…) { fscanf(…); } fclose(fp); } Filesystem may cause performance bottleneck if all processors access the same file simultaneously (Schedule I/O carefully, or let “master” processor do all I/O)

Slide-10 University of Maryland Defect 3: Side-Effect of Parallelization Typical parallel programs contain only a few parallel primitives –The rest of the code is made of a sequential program running in parallel Ordinary serial constructs can cause correctness/performance defects when they are accessed in parallel contexts Advice: –Don’t just focus on the parallel code –Check that the serial code is working on one processor, but remember that the defect may surface only in a parallel context

Slide-11 University of Maryland Defect 4: Performance Example: load balancing Rank 0 “master” processor is just waiting while other “worker” processors are executing the loop myN = N / ( numProc - 1 ); If ( myRank != 0) { for (i=0; i<myN; i++) { if (…) { ++myHits; } } MPI_Reduce(&myHits, &totalHits, 1, MPI_INT, 0, MPI_COMM_WORLD);

Slide-12 University of Maryland Defect 4: Performance Example: scheduling if (rank != 0) { MPI_Send ((board[y1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD); MPI_Recv ((board[y1-1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD, &status); } if (rank != (size-1)) { MPI_Recv ((board[y2+1], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD, &status); MPI_Send ((board[y2], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD); } send/recv Rank #0 Rank #1 Rank #(size-1) y2 y1 y2 y1 y2 y1 xsize Communication requires O(size) time (a “correct” solution takes O(1)) #1 Send → #0 Recv → #0 Send → #1 Recv #2 Send → #1 Recv → #1 Send → #2 Recv #3 Send → #2 Recv → #2 Send → #3 Recv

Slide-13 University of Maryland Defect 4: Performance Scalability problem because processors are not working in parallel –The program output itself is correct –Perfect parallelization is often difficult: need to evaluate if the execution speed is unacceptable Symptom: sub-linear scalability, performance much less than expected (e.g, most time spent waiting), unbalanced amount of computation –Load balancing may depend on input data Advice: –Make sure all processors are “working” in parallel –Profiling tool might help

Slide-14 University of Maryland Defect 5: Synchronization Example: deadlock MPI_Recv ((board[y2+1], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD, &status); MPI_Send ((board[y1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD); MPI_Recv ((board[y1-1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD, &status); MPI_Send ((board[y2], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD); send/recv Rank #0 Rank #1 Rank #(size-1) y2 y1 y2 y1 y2 y1 xsize Obvious example of deadlock (can’t avoid noticing this) #0 Recv → deadlock #1 Recv → deadlock #2 Recv → deadlock

Slide-15 University of Maryland Defect 5: Synchronization Example: deadlock MPI_Send ((board[y1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD); MPI_Recv ((board[y2+1], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD, &status); MPI_Send ((board[y2], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD); MPI_Recv ((board[y1-1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD, &status); send/recv Rank #0 Rank #1 Rank #(size-1) y2 y1 y2 y1 y2 y1 xsize This may work, but it cause deadlock with some implementation and parameters #0 Send → deadlock if MPI_Send is blocking #1 Send → deadlock if MPI_Send is blocking #2 Send → deadlock if MPI_Send is blocking A “correct” solution could be (1) alternate the order of send and recv, (2) use MPI_Bsend with sufficient buffer size, (3) MPI_Sendrecv, or (4) MPI_Isend/recv (see

Slide-16 University of Maryland Defect 5: Synchronization Example: barriers for (…) { MPI_Isend ((board[y1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD, &request); MPI_Recv ((board[y2+1], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD, &status); MPI_Isend ((board[y2], Xsize, MPI_CHAR, rank+1, tag, MPI_COMM_WORLD, &request); MPI_Recv ((board[y1-1], Xsize, MPI_CHAR, rank-1, tag, MPI_COMM_WORLD, &status); } send/recv Rank #0 Rank #1 Rank #(size-1) y2 y1 y2 y1 y2 y1 xsize Synchronization (e.g. MPI_Barrier) is needed at each iteration (But too many barriers can cause a performance problem)

Slide-17 University of Maryland Defect 5: Synchronization Well-known defect type in parallel programming: races, deadlocks –Some defects can be very subtle –Use of asynchronous (non-blocking) communication can lead to more synchronization defects Symptom: program hangs, incorrect/non- deterministic output –This particular example derives from insufficient understanding of the language specification Advice: –Make sure that all communications are correctly coordinated

Slide-18 University of Maryland Summary This is a first cut at understanding common defects in parallel programming