Parallel Programming.

Slides:

Advertisements

Similar presentations

Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 7:

May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.

Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.

PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu

Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

Parallel Programming by Tiago Sommer Damasceno Using OpenMP

Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.

1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.

OpenMPI Majdi Baddourah

A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.

1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.

Programming with Shared Memory Introduction to OpenMP

CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.

Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.

Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.

ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

Compiled by Maria Ramila Jimenez

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.

Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

Introduction to OpenMP

HPC1 OpenMP E. Bruce Pitman October, HPC1 Outline What is OpenMP Multi-threading How to use OpenMP Limitations OpenMP + MPI References.

Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,

Threaded Programming Lecture 2: Introduction to OpenMP.

Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

Heterogeneous Computing using openMP lecture 1 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.

Introduction to OpenMP

Shared Memory Parallelism - OpenMP

Lecture 5: Shared-memory Computing with Open MP

CS427 Multicore Architecture and Parallel Computing

Open[M]ulti[P]rocessing

Introduction to OpenMP

Shared-Memory Programming

September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.

Exploiting Parallelism

Computer Science Department

Lab. 3 (May 6st) You may use either cygwin or visual studio for using OpenMP Compiling in cygwin “> gcc –fopenmp ex1.c” will generate a.exe Execute : “>

OpenMP Quiz B. Wilkinson January 22, 2016.

Lab. 2 Modify the multithreaded JAVA code we practiced in Lab. 1 to C code with pthread library. Pthread library works in UNIX environment. Use cygwin.

Multi-core CPU Computing Straightforward with OpenMP

Parallel Programming with OpenMP

Synchronization Memory Consistency

Lab. 2 (May 12th) Modify the multithreaded JAVA code we practiced in Lab. 1 to C code with pthread library. Pthread library works in UNIX environment.

Hybrid Parallel Programming

Lab. 3 (May 11th) You may use either cygwin or visual studio for using OpenMP Compiling in cygwin “> gcc –fopenmp ex1.c” will generate a.exe Execute :

Programming with Shared Memory Introduction to OpenMP

DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.

Hybrid Parallel Programming

Introduction to OpenMP

Multithreading Why & How.

Lab. 3 (May 1st) You may use either cygwin or visual studio for using OpenMP Compiling in cygwin “> gcc –fopenmp ex1.c” will generate a.exe Execute : “>

Presentation transcript:

Parallel Programming

Parallel Processing Microsoft Word Matrix Multiply Editor Backup SpellCheck GrammarCheck

Parallel Programming Main int val, i; Factor::child(int begin, int end) cout << "Run Factor " << total << ":" << numChild << endl; Factor factor; // Spawn children for (i=0; i<numChild; i++) if (fork() == 0) { factor.child(begin, begin+range); begin += range + 1; } // Wait for children to finish wait(&stat); cout << "All Children Done: " int val, i; for (val=begin; val<end; val++) { for (i=2; i<=end/2; i++) if (val % i == 0) break; if (i>val/2) cout << "Factor:" << val << endl; } exit(0);

SpeedUp

Amdahl’s Law A speedup consists of: A section that will be sped up: parallelized code A section that will not be sped up: sequential code The speedup factor is limited by the section that will not be sped up. Cannot be made faster Can be made faster by 2 times Cannot be made faster Made faster by 2 times

Amdahl’s Law Assume that you want a program to run 2 times faster. The program runs now in 100 seconds. To get it to run faster, you decide to use parallel processing Determine before rate: oldTime = 100 seconds Calculate: newTime = execution time after improvement: newTime = Twice as Fast => 50 seconds 3. Calculate: Remainder = The amount of oldTime that will not be changed after improvement: remainder = part of program not parallel: 10% = 10 seconds 4. Calculate: AffectedTime = The amount of oldTime that must be affected by improvement (before improvement) affectedTime = oldTime – remainder = 90 seconds 5. Solve Amdahls Law = newTotal = (AffectedTime / RateOfChange) + Remainder 50 = (90/R) + 10 6. Solve for RateOfChange (R): 40 = 90/R 40R = 90 R=90/40 R=2.25 Solution: If 2.25 processors are used, the program will run twice as fast.

Parallel Processing Goals The program needs to be correct The program must be fast Load Balancing: The program must be evenly split even between workers

Strong versus Weak Scaling Assume a problem with execution time M Problem uses parallel processing on N processors Strong scaling: New execution time is M/N Weak scaling: New execution time is M

OpenMP Creates threads (instead of processes) Programs forks/waits for you; Insert #pragma directives instead of programming forks Child 1 Child 1 Fork Wait Fork Wait Child 2 Child 2 Child 2 Child 2

Open MP Open MP Features Example SQRT.cpp code Inserts forks and waits for you Compile with library: g++ -fopenmp prg.cpp -o prg Useful Features: Get thread #: int tid = omp_get_thread_num(); Get total # of threads: int numThreads= omp_get_num_threads(); #include <omp.h> // for OpenMP #pragma omp parallel for for (local=begin; local<end; local++) { double root = sqrt((double) local); cout << local << ":"<< root << " "; int localint = (int) local; if ((localint%10)==0) cout << local << ":" << root << " " << endl; }

More on Open MP: The number of threads is set as an operating system environmental variable: At Linux prompt on lab machine: $ export OMP_NUM_THREADS=8 Threads can share memory or have private memory: #pragma omp parallel shared(a,b,c,chunk) private(i) Threads can be given static or dynamic chunk sizes: #pragma omp for schedule(dynamic,chunk) nowait More information at: https://computing.llnl.gov/tutorials/openMP/#Directives

Conclusion There are many things to play with: How much faster is parallel versus single programs? How does load balancing children work? How does a different number of children perform? How does fork-processes versus Open MP perform? Have fun with this! Learn about the system.