Download presentation
Presentation is loading. Please wait.
1
Slide 1 COMP 308 Parallel Efficient Algorithms Lecturer: Dr. Igor Potapov Ashton Building, room 3.15 E-mail: igor@csc.liv.ac.uk COMP 308 web-page: http://www.csc.liv.ac.uk/~igor/COMP308 Introduction to Parallel Computation
2
Slide 2 Course Description and Objectives: The aim of the module is –to introduce techniques for the design of efficient parallel algorithms and –their implementation.
3
Slide 3 At the end of the course you will be: familiar with the wide applicability of graph theory and tree algorithms as an abstraction for the analysis of many practical problems, familiar with the efficient parallel algorithms related to many areas of computer science: expression computation, sorting, graph-theoretic problems, computational geometry, algorithmics of texts etc. familiar with the basic issues of implementing parallel algorithms. Also a knowledge will be acquired of those problems which have been perceived as intractable for parallelization. Learning Outcomes:
4
Slide 4 Teaching method Series of 30 lectures ( 3hrs per week ) LectureMonday 10.00 LectureTuesday 10.00 LectureFriday 10.00 -------------- Course Assessment ---------------------- A two-hour examination80% Continues assignment (Written class test + Home assignment)20% ----------------------------------------------------------------- ------
5
Slide 5 Recommended Course Textbooks Introduction to Algorithms Cormen et al. Introduction to Parallel Computing: Design and Analysis of Algorithms Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis, Benjamin Cummings 2 nd ed. - 2003 Efficient Parallel Algorithms A.Gibbons, W.Rytter, Cambridge University Press 1988. + Research papers (will be announced later)
6
Slide 6 What is Parallel Computing? (basic idea) Consider the problem of stacking (reshelving) a set of library books. – A single worker trying to stack all the books in their proper places cannot accomplish the task faster than a certain rate. – We can speed up this process, however, by employing more than one worker.
7
Slide 7 Solution 1 Assume that books are organized into shelves and that the shelves are grouped into bays One simple way to assign the task to the workers is: – To divide the books equally among them. – Each worker stacks the books one a time This division of work may not be most efficient way to accomplish the task since – The workers must walk all over the library to stack books.
8
Slide 8 Solution 2 An alternative way to divide the work is to assign a fixed and disjoint set of bays to each worker. As before, each worker is assigned an equal number of books arbitrarily. – If the worker finds a book that belongs to a bay assigned to him or her, he or she places that book in its assignment spot – Otherwise, He or she passes it on to the worker responsible for the bay it belongs to. The second approach requires less effort from individual workers Instance of task partitioning Instance of Communication task
9
Slide 9 Problems are parallelizable to different degrees For some problems, assigning partitions to other processors might be more time-consuming than performing the processing locally. Other problems may be completely serial. – For example, consider the task of digging a post hole. Although one person can dig a hole in a certain amount of time, Employing more people does not reduce this time
10
Slide 10 Power of parallel solutions Pile collection – Ants/robots with very limited abilities (see its neighbourhood ) – Grid environment (sticks and robots) Move() Move randomly ( ) Until robot sees a stick in its nighbouhood Collect() Move(); Pick up a sick; Move(); Put it down; Collect();
11
Slide 11 Sorting in nature 6 2 1 3 5 7 4
12
Slide 12 Parallel Processing (Several processing elements working to solve a single problem) Primary consideration: elapsed time – NOT: throughput, sharing resources, etc. Downside: complexity – system, algorithm design Elapsed Time = computation time + communication time + synchronization time
13
Slide 13 Design of efficient algorithms A parallel computer is of little use unless efficient parallel algorithms are available. – The issue in designing parallel algorithms are very different from those in designing their sequential counterparts. – A significant amount of work is being done to develop efficient parallel algorithms for a variety of parallel architectures.
14
Slide 14 Processor Trends Moore’s Law – performance doubles every 18 months Parallelization within processors – pipelining – multiple pipelines
15
Slide 15 Why Parallel Computing Practical: – Moore’s Law cannot hold forever – Problems must be solved immediately – Cost-effectiveness – Scalability Theoretical: – challenging problems
16
Slide 16 Some Complex Problems N-body simulation Atmospheric simulation Image generation Oil exploration Financial processing Computational biology
17
Slide 17 Some Complex Problems N-body simulation – O(n log n) time – galaxy 10 11 stars approx. one year / iteration Atmospheric simulation – 3D grid, each element interacts with neighbors – 1x1x1 mile element 5 10 8 elements – 10 day simulation requires approx. 100 days
18
Slide 18 Some Complex Problems Image generation – animation, special effects – several minutes of video 50 days of rendering Oil exploration – large amounts of seismic data to be processed – months of sequential exploration
19
Slide 19 Some Complex Problems Financial processing – market prediction, investing – Cornell Theory Center, Renaissance Tech. Computational biology – drug design – gene sequencing (Celera) – structure prediction (Proteomics)
20
Slide 20 Fundamental Issues Is the problem amenable to parallelization? How to decompose the problem to exploit parallelism? What machine architecture should be used? What parallel resources are available? What kind of speedup is desired?
21
Slide 21 Two Kinds of Parallelism Pragmatic – goal is to speed up a given computation as much as possible – problem-specific – techniques include: overlapping instructions (multiple pipelines) overlapping I/O operations (RAID systems) “traditional” (asymptotic) parallelism techniques
22
Slide 22 Two Kinds of Parallelism Asymptotic – studies: architectures for general parallel computation parallel algorithms for fundamental problems limits of parallelization – can be subdivided into three main areas
23
Slide 23 Asymptotic Parallelism Models – comparing/evaluating different architectures Algorithm Design – utilizing a given architecture to solve a given problem Computational Complexity – classifying problems according to their difficulty
24
Slide 24 Architecture Single processor: – single instruction stream – single data stream – von Neumann model Multiple processors: – Flynn’s taxonomy
25
Slide 25 MISD SISD MIMD SIMD 1 Many 1 Data Streams Instruction Streams Flynn’s Taxonomy
26
Slide 26
27
Slide 27 Parallel Architectures Multiple processing elements Memory: – shared – distributed – hybrid Control: – centralized – distributed
28
Slide 28 Parallel vs Distributed Computing Parallel: –several processing elements concurrently solving a single same problem Distributed: –processing elements do not share memory or system clock Which is the subset of which? –distributed is a subset of parallel
29
Slide 29 Efficient and optimal parallel algorithms A parallel algorithm is efficient iff – it is fast (e.g. polynomial time) and – the product of the parallel time and number of processors is close to the time of at the best know sequential algorithm T sequential T parallel N processors A parallel algorithms is optimal iff this product is of the same order as the best known sequential time
30
Slide 30 A measure of relative performance between a multiprocessor system and a single processor system is the speed-up S( p), defined as follows: S( p) = Execution time using a single processor system Execution time using a multiprocessor with p processors S( p) = T1TpT1Tp Efficiency = SppSpp Cost = p T p Metrics
31
Slide 31 Metrics Parallel algorithm is cost-optimal: parallel cost = sequential time C p = T 1 E p = 100% Critical when down-scaling: parallel implementation may become slower than sequential T 1 = n 3 T p = n 2.5 when p = n 2 C p = n 4.5
32
Slide 32 Amdahl’s Law f = fraction of the problem that’s inherently sequential (1 – f) = fraction that’s parallel Parallel time T p : Speedup with p processors:
33
Slide 33 What kind of speed-up may be achieved? Part f is computed by a single processor Part (1-f) is computed by p processors, p>1 Basic observation: Increasing p we cannot speed-up part f. f
34
Slide 34 Amdahl’s Law Upper bound on speedup (p = ) Example: f = 2% S = 1 / 0.02 = 50 Converges to 0
35
Slide 35 The basic parallel complexity class is NC. NC is a class of problems computable in poly-logarithmic time (log c n, for a constant c) using a polynomial number of processors. P is a class of problems computable sequentially in a polynomial time The main open question in parallel computations is NC = P ? The main open question
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.