Download presentation
Presentation is loading. Please wait.
Published byChristiana Chandler Modified over 9 years ago
1
Parallel Algorithms Sorting and more
2
Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware they will run on – Sequential algorithms: we are doing this implicitly
3
Creative use of processing power Lots of data = need for speed ~20 years : parallel processing – Studying how to use multiple processors together – Really large and complex computations – Parallel processing was an active sub-field of CS Since 2005: the era of multicore is here – All computers will have >1 processing unit
4
Traditional Computing Machine Von Neumann model: – The stored program computer What is this? – Abstractly, what does it look like?
6
New twist: multiple control units It’s difficult to make the CPU any faster – To increase potential speed, add more CPUs – These CPUs are called cores Abstractly, what might this look like in these new machines?
7
Shared memory model Multiple processors can access memory locations May not scale over time – As we increase the ‘cores’
8
Other ‘parallel’ configurations: Clusters of computers – Network connects them
9
Other ‘parallel’ configurations Massive data centers
10
Clusters and data centers Distributed memory model
11
Algorithms We will use term processor for the processing unit that executes instructions When considering how to design algorithms for these architectures – Useful to start with a base theoretical model – Revise when implementing on different hardware with software packages Parallel computing course – Also consider: Memory location access by ‘competing’/’cooperating’ processors Theoretical arrangement of the processors
12
PRAM model Parallel Random Access Machine Theoretical Abstractly, what does it look like? How do processors access memory in this PRAM model?
13
PRAM model Why is using the PRAM model useful when studying algorithms?
14
PRAM model Processors working in parallel – Each trying to access memory values – Memory value: what do we mean by this? When designing algorithms, we need to consider what type of memory access that algorithm requires How might our theoretical computer work when many reads and writes are happening at the same time?
15
Designing algorithms With many algorithms, we’re moving data around – Sort, e.g. Others? Concurrent reads by multiple processors – Memory not changed, so no ‘conflicts’ Exclusive writes (EW) – Design pseudocode so that any processor is exclusively writing a data value into a memory location
16
Designing Algorithms Arranging the processors – Helpful for design of algorithm We can envision how it works We can envision the data access pattern needed – EREW, CREW (CRCW) – Not how processors are necessarily arranged in practice Although some machines have been – What are some possible arrangements? – Why might these arrangements prove useful for design?
17
Arrangements
18
Sorting in Parallel Emphasis: merge sort
19
Sequential merge sort Recursive – Can envision a recursion tree function mergesort(m) var list left, right if length(m) ≤ 1 return m else middle = length(m) / 2 for each x in m up to middle add x to left for each x in m after middle add x to right left = mergesort(left) right = mergesort(right) result = merge(left, right) return result
20
Parallel merge sort Shared data: 2 lists in memory Sort pairs once in parallel The processes merge concurrently How might we write the pseudocode?
21
Parallel merge sort Shared data: 2 lists in memory Sort pairs once in parallel The processes merge concurrently How might we write the pseudocode? Numbering of processors starts with 0 s = 2 while s <= N do in parallel N/s steps for proc i merge values from i*s to (s*i)+s -1 s = s*2
22
Parallel Merge Sort Work through pseudocode with larger N Processor Arrangement: binary tree Memory access: EREW What was the more practical implementation?
23
Let’s try others Different from sorting
24
Activity: Sum N integers Suppose we have an array of N integers in memory We wish to sum them – Variant: create a running sum in a new array Devise a parallel algorithm for this – Assume PRAM to start – What processor arrangement did you use? – What memory access is required?
25
Next Activity Now suppose you need an algorithm for multiplying a matrix by a vector X= Matrix AVector XResult Vector Devise a parallel algorithm for this Assume PRAM to start Think about what each process will compute- there are options What processor arrangement did you use? What memory access is required?
26
Matrix-Vector Multiplication The matrix is assumed to be M x N. In other words: – The matrix has M rows. – The matrix has N columns. – For example, a 3 x 2 matrix has 3 rows and 2 columns. In matrix-vector multiplication, if the matrix is M x N, then the vector must have a dimension, N. – In other words, the vector will have N entries. – If the matrix is 3 x 2, then the vector must be 3 dimensional. – This is usually stated as saying the matrix and vector must be conformable. Then, if the matrix and vector are conformable, the product of the matrix and the vector is a resultant vector that has a dimension of M. (So, the result could be a different size than the original vector!) For example, if the matrix is 3 x 2, and the vector is 3 dimensional, the result of the multiplication would be a vector of 2 dimensions
27
Matrix-Vector Multiplication Ways to do a parallel algorithm: – One row of matrix per processor – One element of matrix per processor There is additional overhead involved why? What if number of rows M is larger than number of processors? Emerging theme: how to partition the data
28
Expand on previous example Matrix – Matrix multiplication = X = ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.