Parallel Algorithms Lecture Notes. Motivation Programs face two perennial problems:: –Time: Run faster in solving a problem Example: speed up time needed.

Slides:



Advertisements
Similar presentations
Turing Machines January 2003 Part 2:. 2 TM Recap We have seen how an abstract TM can be built to implement any computable algorithm TM has components:
Advertisements

Part IV: Memory Management
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
What is an Algorithm? (And how do we analyze one?)
Reference: Message Passing Fundamentals.
CS 206 Introduction to Computer Science II 09 / 10 / 2008 Instructor: Michael Eckmann.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Introduction to Analysis of Algorithms
1 Lecture 6 Performance Measurement and Improvement.
Parallelizing Compilers Presented by Yiwei Zhang.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
Analysis of Algorithms Spring 2015CS202 - Fundamentals of Computer Science II1.
A Bridge to Your First Computer Science Course Prof. H.E. Dunsmore Concurrent Programming Threads Synchronization.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
08/26/2010CS4961 CS4961 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 26,
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
CS4961 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 25, /25/2011 CS4961.
Concurrency, Mutual Exclusion and Synchronization.
CSE524 Parallel Algorithms Lawrence Snyder 30 March 2010.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Operating Systems Lecture No. 2. Basic Elements  At a top level, a computer consists of a processor, memory and I/ O Components.  These components are.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
CY2003 Computer Systems Lecture 04 Interprocess Communication.
Lab 2 Parallel processing using NIOS II processors
Copyright © Curt Hill Concurrent Execution An Overview for Database.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Simple algorithms on an array - compute sum and min.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Concurrency and Performance Based on slides by Henri Casanova.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
Complexity of Algorithms Fundamental Data Structures and Algorithms Ananda Guna January 13, 2005.
Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.
Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.
1 A simple parallel algorithm Adding n numbers in parallel.
08/23/2012CS4230 CS4230 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 23,
PipeliningPipelining Computer Architecture (Fall 2006)
Algorithm Complexity is concerned about how fast or slow particular algorithm performs.
These slides are based on the book:
PARALLEL COMPUTING.
Threads Cannot Be Implemented As a Library
CMSC 341 Prof. Michael Neary
Objective of This Course
CS4961 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 27, /25/2009 CS4961.
CSE8380 Parallel and Distributed Processing Presentation
Algorithm Analysis and Design
Concurrency: Mutual Exclusion and Process Synchronization
Introduction to Data Structure
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
Programming with Shared Memory Specifying parallelism
Presentation transcript:

Parallel Algorithms Lecture Notes

Motivation Programs face two perennial problems:: –Time: Run faster in solving a problem Example: speed up time needed to sort 10 million records –Size: Solve a “bigger” problem Example: multiply matrixes of big dimensions : PC with 512MB RAM, can store a max size of 8192*8192 elems of a double type of 8 bytes) Possible solution: parallelism –Split a problem into several tasks and perform these in parallel –A parallel computer: a broad definition: a set of processors that are able to work cooperatively to solve a computational problem Includes: parallel supercomputers, clusters of workstations, multiple-processor workstations

Concepts …

Logical vs physical parallelism A concurrent program, 3 processes Proces P0 Proces P1 Proces P2 0 T P0 P1 P0 P2 P1 P2 0 T P0 P1 P2 Program executed on a system with 1 processor Logical parallelism Multi-programming Program executed on a system with 3 processors Physical parallelism Multi-processing P2

concurrent-distributed-parallel

parallel  distributed Parallel computingDistributed computing Most often all processors are of the same type Processors are heterogenous Most often the processors are located in the same location Processors are distributed on a wide area Overall goal = Speed (doing a job faster) Overall goal = Convenience (using resources, increasing reliability)

Parallelizing sequential code The enabling condition for doing 2 tasks in parallel: no dependences between them ! Parallelizing compilers: compile sequential programs into parallel code –Research goal since the 1970’s

Example: Adding n numbers Sequential solution: sum = 0; for (i=0; i<n; i++) { sum += A[i]; } O(n) The sequential algorithm cannot be straightforward parallelized, since every instruction depends on the previous one

Parallelizing = re-thinking algorithm ! Summing in sequence Always O(n) Summing in pairs P=1: O(n) P=n/2: O(log n)

It’s not likely a compiler will produce a good parallel code from a sequential specification any time soon… Fact: For most computations, a “best” sequential solution (practically, not theoretically) and a “best” parallel solution are usually fundamentally different … Different solution paradigms imply computations are not “simply” related Compiler transformations generally preserve the solution paradigm Therefore the programmer must discover the parallel solution !!!

Sequential vs parallel programming Has different costs, different advantages Requires different, unfamiliar algorithms Must use different abstractions More complex to understand a program’s behavior More difficult to control the interactions of the program’s components Knowledge/tools/understanding more primitive

Example: Count number of 3’s Sequential solution: count = 0; for (i=0; i<length; i++) { if (array[i]==3) count ++; } O(n)

Example: Trial solution 1 Divide array into t=4 chunks Assign each chunk to a different concurrent task identified by id=0...t-1 Code of each task: int length_per_thread = length/t; int start = id * length_per_thread; for (i=start; i<start+length_per_thread; i++) { if (array[i] == 3) count += 1; } Problem: Race condition ! This is not a correct concurrent program Accesses to the same shared mem (variable count) should be protected

Example: Trial solution 2 Correct previous trial solution by adding mutex locks in order to prevent concurrent accesses to shared variable count Code of each task: mutex m; for (i=start; i<start+length_per_thread; i++) { if (array[i] == 3) { mutex_lock(m); count ++; mutex_unlock(m); } Problem: VERY slow ! There is no real parallelism, tasks wait after each other all the time

Example: Trial solution 3 Each processor adds into its own private counter, combine partial counts at the end Code of each task: for (i=start; i<start+length_per_thread; i++) { if (array[i] == 3) { private_count [id] ++; } mutex_lock(m); count+=private_count[id]; mutex_unlock(m); Problem: STILL no speedup measured when using more than 1 processor ! Reason: false sharing

Example: false sharing

Example: solution 4 Forcing each private counter to be on a separate cache line, by “padding” them with “unused” locations struct padded_int { int value; char padding[128]; } private_count[MaxThreads]; Finally a speedup is measured when using more than 1 processor ! Conclusion: producing correct and efficient parallel programs can be considerably more difficult than writing correct and efficient serial programs !!!

Sequential vs parallel programming Has different costs, different advantages Requires different, unfamiliar algorithms Must use different abstractions More complex to understand a program’s behavior More difficult to control the interactions of the program’s components Knowledge/tools/understanding more primitive

Goals of Parallel Programming Performance: Parallel program runs faster than its sequential counterpart (a speedup is measured) Scalability: as the size of the problem grows, more processors can be “usefully” added to solve the problem faster Portability: The solutions run well on different parallel platforms