Slide 1 Course Description and Objectives: The aim of the module is –to introduce techniques for the design of efficient parallel algorithms and –their.

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Parallel vs Sequential Algorithms
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
PRAM (Parallel Random Access Machine)
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Slide 1 COMP 308 Parallel Efficient Algorithms Lecturer: Dr. Igor Potapov Ashton Building, room COMP 308 web-page:
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
Introduction to Parallel Processing Ch. 12, Pg
1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
COMP308 Efficient Parallel Algorithms
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors.
Parallel Computing.
Data Structures and Algorithms in Parallel Computing Lecture 1.
2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Parallel Computing Presented by Justin Reschke
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Concurrency and Performance Based on slides by Henri Casanova.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Classification of parallel computers Limitations of parallel processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
Auburn University
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.
Introduction to Parallel Computing
PARALLEL COMPUTING Submitted By : P. Nagalakshmi
Distributed and Parallel Processing
Parallel Processing - introduction
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Pipelining and Vector Processing
Data Structures and Algorithms in Parallel Computing
Objective of This Course
Introduction to Parallel Computing by Grama, Gupta, Karypis, Kumar
AN INTRODUCTION ON PARALLEL PROCESSING
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Module 6: Introduction to Parallel Computing
Presentation transcript:

Slide 1 Course Description and Objectives: The aim of the module is –to introduce techniques for the design of efficient parallel algorithms and –their implementation.

Slide 2 At the end of the course you will be:  familiar with the wide applicability of graph theory and tree algorithms as an abstraction for the analysis of many practical problems,  familiar with the efficient parallel algorithms related to many areas of computer science: expression computation, sorting, graph-theoretic problems, computational geometry, algorithmics of texts etc.  familiar with the basic issues of implementing parallel algorithms. Also a knowledge will be acquired of those problems which have been perceived as intractable for parallelization. Learning Outcomes:

Slide 3 Teaching method Series of 30 lectures ( 3hrs per week ) LectureMonday LectureTuesday LectureFriday Course Assessment A two-hour examination80% Continues assignment (Written class test)20%

Slide 4 Recommended Course Textbooks Introduction to Algorithms Cormen et al. Introduction to Parallel Computing: Design and Analysis of Algorithms Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis, Benjamin Cummings 2 nd ed Efficient Parallel Algorithms A.Gibbons, W.Rytter, Cambridge University Press 1988.

Slide 5 What is Parallel Computing? Consider the problem of stacking (reshelving) a set of library books. –A single worker trying to stack all the books in their proper places cannot accomplish the task faster than a certain rate. –We can speed up this process, however, by employing more than one worker.

Slide 6 Solution 1 Assume that books are organized into shelves and that the shelves are grouped into bays One simple way to assign the task to the workers is: –To divide the books equally among them. –Each worker stacks the books one a time This division of work may not be most efficient way to accomplish the task since –The workers must walk all over the library to stack books.

Slide 7 Solution 2 An alternative way to divide the work is to assign a fixed and disjoint set of bays to each worker. As before, each worker is assigned an equal number of books arbitrarily. –If the worker finds a book that belongs to a bay assigned to him or her, he or she places that book in its assignment spot –Otherwise, He or she passes it on to the worker responsible for the bay it belongs to. The second approach requires less effort from individual workers Instance of task partitioning Instance of Communication task

Slide 8 Problems are parallelizable to different degrees For some problems, assigning partitions to other processors might be more time-consuming than performing the processing locally. Other problems may be completely serial. –For example, consider the task of digging a post hole. Although one person can dig a hole in a certain amount of time, Employing more people does not reduce this time

Slide 9 Sorting in nature

Slide 10 Parallel Processing (Several processing elements working to solve a single problem) Primary consideration: elapsed time –NOT: throughput, sharing resources, etc. Downside: complexity –system, algorithm design Elapsed Time = computation time + communication time + synchronization time

Slide 11 Design of efficient algorithms A parallel computer is of little use unless efficient parallel algorithms are available. –The issue in designing parallel algorithms are very different from those in designing their sequential counterparts. –A significant amount of work is being done to develop efficient parallel algorithms for a variety of parallel architectures.

Slide 12 The basic parallel complexity class is NC. NC is a class of problems computable in poly- logarithmic time (log c n, for a constant c) using a polynomial number of processors. P is a class of problems computable sequentially in a polynomial time The main open question in parallel computations is NC = P ? The main open question

Slide 13 Efficient and optimal parallel algorithms A parallel algorithm is efficient iff –it is fast (e.g. polynomial time) and –the product of the parallel time and number of processors is close to the time of at the best know sequential algorithm T sequential  T parallel  N processors A parallel algorithms is optimal iff this product is of the same order as the best known sequential time

Slide 14 Processor Trends Moore’s Law –performance doubles every 18 months Parallelization within processors –pipelining –multiple pipelines

Slide 15

Slide 16

Slide 17 Why Parallel Computing Practical: –Moore’s Law cannot hold forever –Problems must be solved immediately –Cost-effectiveness –Scalability Theoretical: –challenging problems

Slide 18 Some Complex Problems N-body simulation Atmospheric simulation Image generation Oil exploration Financial processing Computational biology

Slide 19 Some Complex Problems N-body simulation –O(n log n) time –galaxy  stars  approx. one year / iteration Atmospheric simulation –3D grid, each element interacts with neighbors –1x1x1 mile element  5  10 8 elements –10 day simulation requires approx. 100 days

Slide 20 Some Complex Problems Image generation –animation, special effects –several minutes of video  50 days of rendering Oil exploration –large amounts of seismic data to be processed –months of sequential exploration

Slide 21 Some Complex Problems Financial processing –market prediction, investing –Cornell Theory Center, Renaissance Tech. Computational biology –drug design –gene sequencing (Celera) –structure prediction (Proteomics)

Slide 22 Fundamental Issues Is the problem amenable to parallelization? How to decompose the problem to exploit parallelism? What machine architecture should be used? What parallel resources are available? What kind of speedup is desired?

Slide 23 Two Kinds of Parallelism Pragmatic –goal is to speed up a given computation as much as possible –problem-specific –techniques include: overlapping instructions (multiple pipelines) overlapping I/O operations (RAID systems) “traditional” (asymptotic) parallelism techniques

Slide 24 Two Kinds of Parallelism Asymptotic –studies: architectures for general parallel computation parallel algorithms for fundamental problems limits of parallelization –can be subdivided into three main areas

Slide 25 Asymptotic Parallelism Models –comparing/evaluating different architectures Algorithm Design –utilizing a given architecture to solve a given problem Computational Complexity –classifying problems according to their difficulty

Slide 26 Architecture Single processor: –single instruction stream –single data stream –von Neumann model Multiple processors: –Flynn’s taxonomy

Slide 27

Slide 28 MISD SISD MIMD SIMD 1 Many 1 Data Streams Instruction Streams Flynn’s Taxonomy

Slide 29

Slide 30

Slide 31 Parallel Architectures Multiple processing elements Memory: –shared –distributed –hybrid Control: –centralized –distributed

Slide 32 Parallel vs Distributed Computing Parallel: –several processing elements concurrently solving a single same problem Distributed: –processing elements do not share memory or system clock Which is the subset of which? –distributed is a subset of parallel

Slide 33 Parallelization Control vs Data parallel: –control: different operations on different data elements –data: same operations on different data elements Coarse vs Fine grained: –algorithm granularity: ratio of computation to communication time –architecture granularity: ratio of computation to communication cost

Slide 34 An Idealized Parallel Computer PRAM –EREW –CREW –ERCW –CRCW