Cc Compiler Parallelization Options CSE 260 Mini-project Fall 2001 John Kerwin.

Slides:



Advertisements
Similar presentations
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Advertisements

Scheduling and Performance Issues for Programming using OpenMP
Distributed Systems CS
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
INTEL CONFIDENTIAL Reducing Parallel Overhead Introduction to Parallel Programming – Part 12.
– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
Threaded Programming Lecture 4: Work sharing directives.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
MPI and OpenMP.
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
10/05/2010CS4961 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP Mary Hall October 5,
Parallel Computing Presented by Justin Reschke
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared-memory Programming
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Computer Science Department
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Distributed Systems CS
Introduction to OpenMP
Parallel Computing Explained How to Parallelize a Code
Programming with Shared Memory Specifying parallelism
Shared-Memory Paradigm & OpenMP
WorkSharing, Schedule, Synchronization and OMP best practices
Presentation transcript:

cc Compiler Parallelization Options CSE 260 Mini-project Fall 2001 John Kerwin

Background The Sun Workshop 6.1 cc compiler does not support OpenMP, but it does contain Multi Processing options similar to those in OpenMP. FOR MORE INFO... Chapter 4 of the C User's Guide at discusses how the compiler can Parallelize Sun ANSI/ISO C Code. Slides from "Application Tuning on Sun Systems" by Ruud van der Pas at contain a lot of useful information about compiler options.

Three Ways to Enable Compiler Parallelization -xautoparAutomatic parallelization –Just compile and run on a multiprocessor. –Use command like "setenv PARALLEL 8" to set the number of processors at runtime. -xexplicitparExplicit parallelization only –Use pragmas similar to those used in OpenMP to guide the compiler. -xparallelAutomatic and explicit parallelization

-xautopar Requires -xO3 or higher optimization Includes -xdepend -xdepend analyzes loops for inter-iteration data dependencies and restructures them if possible to allow different iterations of the loop to be executed in parallel. -xautopar analyzes every loop in the program and generates parallel code for parallelizable loops.

How Automatic Parallelization Works At the beginning of the program, the master thread spawns slave threads to execute the parallel code. The slave threads wait idly until the master thread encounters a parallelizable loop that is profitable to execute in parallel. If it encounters one, different iterations of the loop are assigned to slave threads, and all the threads synchronize at a barrier at the end of the loop.

How Automatic Parallelization Works (continued) The master thread uses an estimate of the granularity of each loop (number of iterations, versus the overhead of distributing work to threads and synchronizing) to determine whether or not it is profitable to execute the loop in parallel. If it cannot determine the granularity of the loop at compile time, it will generate both serial and parallel versions of the loop, and only call the parallel version at runtime if the number of iterations justify the overhead.

How Effective is –xautopar? Success or failure with -xautopar depends on –Type of application –Coding style –Quality of the compiler The compiler may not be able to automatically parallelize the loops in the most efficient manner. This can happen if: –The data dependency analysis is unable to determine whether or not it is safe to parallelize a loop. –The granularity is not high enough because the compiler lacks information to parallelize the loop at the highest possible level. Can check parallelization messages to see which loops were parallelized by using the -xloopinfo option.

–xexplicitpar This is when explicit parallelization through pragmas comes into the picture. -xexplicitpar allows the programmer to insert pragmas into the code to guide the compiler on how to parallelize certain loops. The programmer is responsible for ensuring pragmas are used correctly, otherwise results are undefined. –Use -xvpara to print compiler warnings about potentially misused pragmas.

Examples of some Pragmas Similar to OpenMP Pragmas Static Scheduling All the iterations of the loop are uniformly distributed among all the participating processors. #pragma MP taskloop schedtype(static) for(i=1; i < N-1; i++) {... } similar to #pragma omp for schedule(static)

Examples of some Pragmas Similar to OpenMP Pragmas Dynamic Scheduling with a Specified chunk_size #pragma MP taskloop schedtype(self(120)) similar to #pragma omp for schedule(dynamic, 120) Guided Dynamic Scheduling with a Minimum chunk_size #pragma MP taskloop schedtype(gss(10)) similar to #pragma omp for schedule(guided, 10)

Speedup Using Static, Dynamic, and Guided MP Pragmas with 8 Processors

Speedup from MPI, Pthreads, and Sun MP Programs with 8 Processors

Time Spent Converting Serial Code to Parallel Code

Coming Soon: OpenMP OpenMP is supported in the Workshop 6.2 C compiler #include