Parallel Programming
Parallel Processing Microsoft Word Matrix Multiply Editor Backup SpellCheck GrammarCheck
Parallel Programming Main int val, i; Factor::child(int begin, int end) cout << "Run Factor " << total << ":" << numChild << endl; Factor factor; // Spawn children for (i=0; i<numChild; i++) if (fork() == 0) { factor.child(begin, begin+range); begin += range + 1; } // Wait for children to finish wait(&stat); cout << "All Children Done: " int val, i; for (val=begin; val<end; val++) { for (i=2; i<=end/2; i++) if (val % i == 0) break; if (i>val/2) cout << "Factor:" << val << endl; } exit(0);
SpeedUp
Amdahl’s Law A speedup consists of: A section that will be sped up: parallelized code A section that will not be sped up: sequential code The speedup factor is limited by the section that will not be sped up. Cannot be made faster Can be made faster by 2 times Cannot be made faster Made faster by 2 times
Amdahl’s Law Assume that you want a program to run 2 times faster. The program runs now in 100 seconds. To get it to run faster, you decide to use parallel processing Determine before rate: oldTime = 100 seconds Calculate: newTime = execution time after improvement: newTime = Twice as Fast => 50 seconds 3. Calculate: Remainder = The amount of oldTime that will not be changed after improvement: remainder = part of program not parallel: 10% = 10 seconds 4. Calculate: AffectedTime = The amount of oldTime that must be affected by improvement (before improvement) affectedTime = oldTime – remainder = 90 seconds 5. Solve Amdahls Law = newTotal = (AffectedTime / RateOfChange) + Remainder 50 = (90/R) + 10 6. Solve for RateOfChange (R): 40 = 90/R 40R = 90 R=90/40 R=2.25 Solution: If 2.25 processors are used, the program will run twice as fast.
Parallel Processing Goals The program needs to be correct The program must be fast Load Balancing: The program must be evenly split even between workers
Strong versus Weak Scaling Assume a problem with execution time M Problem uses parallel processing on N processors Strong scaling: New execution time is M/N Weak scaling: New execution time is M
OpenMP Creates threads (instead of processes) Programs forks/waits for you; Insert #pragma directives instead of programming forks Child 1 Child 1 Fork Wait Fork Wait Child 2 Child 2 Child 2 Child 2
Open MP Open MP Features Example SQRT.cpp code Inserts forks and waits for you Compile with library: g++ -fopenmp prg.cpp -o prg Useful Features: Get thread #: int tid = omp_get_thread_num(); Get total # of threads: int numThreads= omp_get_num_threads(); #include <omp.h> // for OpenMP #pragma omp parallel for for (local=begin; local<end; local++) { double root = sqrt((double) local); cout << local << ":"<< root << " "; int localint = (int) local; if ((localint%10)==0) cout << local << ":" << root << " " << endl; }
More on Open MP: The number of threads is set as an operating system environmental variable: At Linux prompt on lab machine: $ export OMP_NUM_THREADS=8 Threads can share memory or have private memory: #pragma omp parallel shared(a,b,c,chunk) private(i) Threads can be given static or dynamic chunk sizes: #pragma omp for schedule(dynamic,chunk) nowait More information at: https://computing.llnl.gov/tutorials/openMP/#Directives
Conclusion There are many things to play with: How much faster is parallel versus single programs? How does load balancing children work? How does a different number of children perform? How does fork-processes versus Open MP perform? Have fun with this! Learn about the system.