Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Slides:

Advertisements

Similar presentations

Dense Linear Algebra (Data Distributions) Sathish Vadhiyar.

Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Lecture 19: Parallel Algorithms

Practical techniques & Examples

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Dense Matrix Algorithms. Topic Overview Matrix-Vector Multiplication Matrix-Matrix Multiplication Solving a System of Linear Equations.

Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,

CSCI-455/552 Introduction to High Performance Computing Lecture 11.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.

4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Numerical Algorithms Matrix multiplication

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.

Numerical Algorithms • Matrix multiplication

Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.

Reference: Message Passing Fundamentals.

1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.

Sparse Matrix Algorithms CS 524 – High-Performance Computing.

Point-to-Point Communication Self Test with solution.

CEG 221 Lesson 5: Algorithm Development II Mr. David Lippa.

Collective Communications Self Test with solution.

(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Design of parallel algorithms Matrix operations J. Porras.

Dense Matrix Algorithms CS 524 – High-Performance Computing.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,

Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,

Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.

Parallel Numerical Integration Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN

4.6 Numerical Integration Trapezoid and Simpson’s Rules.

MATLAB FUNDAMENTALS: INTRO TO LINEAR ALGEBRA NUMERICAL METHODS HP 101 – MATLAB Wednesday, 11/5/2014

Systems and Matrices (Chapter5)

Mathematics for Computer Graphics (Appendix A) Won-Ki Jeong.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.

1 1.4 Linear Equations in Linear Algebra THE MATRIX EQUATION © 2016 Pearson Education, Inc.

CSE 260 – Parallel Processing UCSD Fall 2006 A Performance Characterization of UPC Presented by – Anup Tapadia Fallon Chen.

Dense Linear Algebra Sathish Vadhiyar. Gaussian Elimination - Review Version 1 for each column i zero it out below the diagonal by adding multiples of.

Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.

Lab 2 Parallel processing using NIOS II processors

Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Dense Linear Algebra Sathish Vadhiyar. Gaussian Elimination - Review Version 1 for each column i zero it out below the diagonal by adding multiples of.

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

1.3 Solutions of Linear Systems

1 Lecture 3 Post-Graduate Students Advanced Programming (Introduction to MATLAB) Code: ENG 505 Dr. Basheer M. Nasef Computers & Systems Dept.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Lecture 9 Architecture Independent (MPI) Algorithm Design

April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

Numerical Algorithms Chapter 11.

Algorithms and Programming

Computation of the solutions of nonlinear polynomial systems

Parallel Programming By J. H. Wang May 2, 2017.

Pipelining and Vector Processing

CSCE569 Parallel Computing

Parallel Programming in C with MPI and OpenMP

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Algorithm Discovery and Design

Copyright © Cengage Learning. All rights reserved.

Introduction to parallelism and the Message Passing Interface

Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.

Lets Play with arrays Singh Tripty

Dense Linear Algebra (Data Distributions)

Jacobi Project Salvatore Orlando.

Numerical Integration

Objectives Approximate a definite integral using the Trapezoidal Rule.

Presentation transcript:

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute Gdansk University October 8-26,2012

General comments For all problems students should develop and run sequential programs for one processor and test specific numeric cases for comparison with their parallel code results. Estimate speedups and efficiencies.. Problems 1-6 C/MPI Problems 7-9 C/OpenMP

Problem 1.Version 1. Design and implement an MPI/C program for the matrix/vector product. 1. Given are : a cluster consisting of p=4 networked processors, a square n=16, (16 x 16) matrix called A and a vector x. 2.Write a sequential code for matrix/vector product. Generate some matrix and a vector with integer components 3. Initially A and x are located on process Divide A into 4 row-strips each with 4 rows. 5. Move x and one strip to process 1,2 and Let each process compute a part of the product vector y. 7. Assemble the product vector on process 0 and let process 0 print the final result vector y. …………………………

Parallel matrix/vector multiply. Partitioning the problem Ax=y x = Each strip of A has 4 rows Each process calculates a part ( four elements ) of y.

Matrix/vector product Version 2. Make Matrix A and vector x available to all processes as global variables. Each process calculates a partial product by multiplying one column of A by an element of x. Process 0 will add the partial results. A x y

Matrix-vector product Write two different programs and check results for the same data. Increase the matrix and vector size to n= 400 and compare the parallel compute times. Which version is faster? Why?

Comment on a Fortran alternative Matrix vector product using Fortran In Fortran two dimensional matrices are stored In memory by columns. We would prefer decomposing the matrix by columns and having each process to produce a column strip as shown on this slide. This algorithm is different from the algorithm Version 1 used for C++. In the C++ version 1 we could use dot products.

Problem 2.

Parallel Monte Carlo method for calculating

Monte Carlo computation of r=1 = Counting pairs of random numbers x,y that satisfy inequality yes no

Monte Carlo algorithm

The task. Parallel algorithm. 1.Process 0 generates 2,000 p random uniformly distributed numbers between 0 and 1, where p is the number of processors in the cluster. 2. It sends 2,000 numbers to processes 1, 2 … p Every process checks pairs of numbers and counts the pairs satisfying the test. 4. This count is sent to process 0 for computing the sum and calculating an approximation to Implement the following parallel algorithm.

Comments. For generating random numbers use the library program available in C or C++. Before writing a MPI/C++ parallel code write and test a sequential code. All processes execute the same code for different data.This is called the Single Program Multiple Data; SPMD.

Continued Another version is also possible. One process is dedicated to generating random numbers and sending them one by one to other worker processes. Worker processes check each pair and accumulate their results. After checking all pairs the master process gets the partial results by using MPI_Reduce. It calculates the final approximation. This version suffers from large number of communications.

Problem 3. Definite integral of a one dimensional function. Input: a,b, f(x) Output the integral value. The method used is the trapezoidal rule. Implement this parallel algorithm for: a=-2, b=2 and n=512

Parallel integration program.

Comments. The final collection of partial results should be done using MPI_Reduce. Assuming that we have p processes the subintervals are: 0 [a, a+(n/p)h] 1 [a+(n/p)h, a + 2(n/p)h] ……………………………….. p-1 [a+(p-1)(n/p)h, b]

Comments In your program every process computes its local integration interval by using its rank. Make variables a, b, n available to all processes. They are global variables. All processes use the simple trapezoidal rule for computing approximate integral.

Problem 4. Dot product Definition. Two vec,,,,,,,,,,,,,,,,,tors have the sa,Two vec,,,,,,,,,,,,,,,,,tors have the sa, Two vectors x and y are of the same size. 1.Write a sequential program for computing dot product 2.Assume n=1, Generate two vectors x and y and test the sequential program.

Dot product Parallel program. 1. Given the number of processes p the vectors x and y are divided into p parts each containing components. 2. Block mapping of the vector x to processes is below:

3. Use your sequential program for computing parts of dot product in the parallel program. 4. Use MPI_Reduce to sum up all partial results. Assume the root process Print the result. Dot product

The initial location of x and y is process 0. Send both vectors to all other processes. Each process ( including 0) will calculate a partial dot product for different set of x and y indices. In general process k starts with the index kn/p and adds n/p xy multiples. k = my_rank characterizes every process and such value as kn/p is called local. Every process has a different variable kn/p. Variables that are the same for all processes are called global.

Problem 5. Simpson’s rule for integration. Simpson’s rule is similar to the trapezoidal rule but it is more accurate. To approximate the integral between two points it uses the midpoint and a second order curve passing through the three points of the subinterval. These points are : Two points define a trapezoid. Three point define a parabola

Problem 5. Simpson’s rule for integration. Notice similarity to the trapezoidal rule. Simpson’s rule is more accurate for many functions f(x) but it requires more computation.

Simpson’s rule programming problem. Write a sequential program implementing Simpson’s rule for integration. Test it for: a=-2, b=2,n=1024 and Then write a parallel C/MPI program for two processes running on two processors ; process 0 and process 1. Make process 0 calculate the integral using the trapezoidal rule and process 1 using Simpson’s rule. Compare the results. How to show experimentally that Simpson’s rule is more accurate?

Problem nr 6. Design and run an C/MPI program for solving a set of linear algebraic equations using the Jacobi iterative method. The test set should have at least 16 linear equations. The communicator should include at least four processors. Choose or create equations with the dominant diagonal. Your MPI code should use the MPI Barrier function for synchronizing parallel computation...To verify the solution write and run a sequential code for the same problem. Attach full computational and communication complexity analysis.

Problem 7 Write a sequential C main program for multiplying square matrix A by a vector x Insert OpenMP compiler directive for executing it in parallel The matrix should be large enough so that each parallel thread has at least 10 loops to execute. Parallelize the outer and then the inner loop. Explain the run time difference.

Problem 8 Write a sequential C main program to compute a dot product of two large vectors a and b. Assume that the size of a and b are divisible by the number of threads. Write n OpenMP code to calculate the dot product and use clause reduce to calculate the final result.

Problem 9 Adding matrix elements Write and run two C/OpenMP programs for adding elements of a square matrix a. Implement two versions of loops as shown on this page. The value of n should be 100 *(number of threads). Time both codes. Which of the two versions runs faster. Explain why?