Parallel Computing Project (OPENMP using LINUX for Parallel application) Summer 2008 Group Project Instructor: Prof. Nagi Mekhiel August 12 th,, 2008 Ravi Illapani Kyunghee Ko Lixiang Zhang
2 OpenMP Parallel Computing Solution Stack
3 Recall Basic Idea of OpenMP The program generated by the compiler is executed by multiple threads One thread per processor or core Each thread performs part of the work Parallel parts executed by multiple threads Sequential parts executed by single thread Dependences in parallel parts require synchronization between threads
4 Recall Basic Idea: How OpenMP Works User must decide what is parallel in program Makes any changes needed to original source code E.g. to remove any dependences in parts that should run in parallel User inserts directives telling compiler how statements are to be executed What parts of the program are parallel How to assign code in parallel regions to threads Specifies data sharing attributes: shared, private, threadprivate…
5 How The User Interacts with Compiler Compiler generates explicit threaded code Shields user from many details of the multithreaded code Compiler figures out details of code each thread needs to execute Compiler does not check that programmer directives are correct!!! Programmer must be sure the required synchronization is inserted The result is a multithreaded object program
6 OpenMP Compilers and Platforms Intel C++ and Fortran Compilers from Intel Intel IA32 Linux/Windows Systems Intel Itanium-based Linux/Windows Systems Fujitsu/Lahey Fortran, C and C++ Intel Linux Systems, Fujitsu Solaris Systems HP HP-UX PA-RISC/Itanium, HP Tru64 Unix Fortran/C/C++ IBM XL Fortran and C from IBM IBM AIX Systems Guide Fortran and C/C++ from Intel's KAI Software Lab Intel Linux/Windows Systems PGF77 / PGF90 Compilers from The Portland Group (PGI) Intel Linux/Solaris/Windows/NT Systems Freeware: Omni, OdinMP, OMPi, OpenUH... Check information at
7 Structure of a Compiler Front End Read in source program, ensure that it is error-free, build the intermediate representation(IR) Middle End Analyze and optimize program as much as possible. “Lower” IR to machine-like form Back End Determine layout of program data in memory. Generate object code for the target architecture and optimize it
8 OpenMP Implementation
9 OpenMP Implementation (con’t) If program is compiled sequentially OpenMP comments and pragmas are ignored If code is compiled for parallel execution Comments and/or pragmas are read, and Drive translation into parallel program Ideally, one source for both sequential and parallel program (big maintenance plus) Usually this is accomplished by choosing a specific compiler option
10 OpenMP Implementation (con’t) Transforms OpenMP programs into multi- threaded code Figures out the details of the work to be performed by each thread Arranges storage for different data and performs their initializations: shared, private... Manages threads: creates, suspends, wakes up, terminates threads Implements thread synchronization
11 Implementation-Defined Issues OpenMP leaves some issues to the implement Default number of threads Default schedule and default for schedule (runtime) Number of threads to execute nested parallel regions Behaviour in case of thread exhaustion And many others.... Despite many similarities, each implementation is a little different from all others
Butterfly effect The butterfly effect is a phrase that encapsulates the more technical notion of sensitive dependence on initial conditions in chaos theory. Small variations of the initial condition of a dynamical system may produce large variations in the long term behavior of the system As butterfly describes, we gave parameters a little change and we got the totally different results.
13 System Overview The classical model assumes having a magnetic pendulum which is attracted by three magnets with each magnet having a distinct color. The magnets are located underneath the pendulum on a circle centered at the pendulum mount-point. They are strong enough to attract the pendulum in a way that it will not come to rest in the center position
System Overview (con’t)
Beeman Integration Algorithm The formula used to compute the positions at time t + Δt is: and this is the formula used to update the velocities:
Simulation results Exp 1: Single core vs dual core…. Performance w.r.t number of threads….. Serial vs parallel….. 32 tests were conducted… 17
18
Exp 2: Simulation when the no.of magnets are changed…. Simulation of the behavior of the pendulum…. 5 tests were conducted.. 19
20
21
Exp 3 In this experiment, we simulate the pendulum in a field of 2 magnets with varying values of friction and gravitation forces. A total number of 63 simulations were run: 22
23
Exp 4 In this experiment, we simulate the pendulum in a field of 3 magnets with varying values of friction and gravitation forces. A total number of 63 simulations were run: 24
25
Exp 5 In this experiment, we simulate the pendulum in a field of 8 magnets with varying values of friction and gravitation forces. A total number of 26 simulations were run: 26
27
28 Conclusion Even though the hardware is available, effective programming is required to maximize code efficiency. Complex simulations can be performed faster using parallel architecture. Openmp helps!! Simple: everybody can learn it in 11weeks Not so simple: Don’t stop learning! keep learning it for better performance
29 References [1] Michael Resch, Edgar Gabriel, Alfred Geiger (1999). An Approach for MPI Based Metacomputing, High Performance Distributed Computing Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 17, retrieved from ACM website August, [2] William Gropp, Ewing Lusk, Rajeev Thakur (1998), A case for using OPENMP's derived datatypes to improve I/O performance, Conference on High Performance Networking and Computing Proceedings of the 1998 ACM/IEEE conference on Supercomputing, 1-10, retrieved from ACM website August, [3] Michael Kagan (2006), Application acceleration through OPENMP overlap, Proceedings of the 2006 ACM/IEEE conference on Supercomputing,, retrieved from ACM website August, [4] Kai Shen, Hong Tang, Tao Yang (1999), Compile/run-time support for threaded OPENMP execution on multiprogrammed shared memory machines, ACM SIGPLAN Notices Volume 34, Issue 8, ,, retrieved from ACM website August, [5] Wikipedia Reference, retrieved from Wikipedia.org website August, [6] Software install, compiler, code Reference, retrieved website August,
30