Download presentation
Presentation is loading. Please wait.
1
Data Parallel Pattern 6c.1
ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, Oct 22, 2012 6c.1
2
Data Parallel Computations
Same operation performed on different data elements simultaneously; i.e., in parallel. Fully synchronous. All processes operate in synchronism Particularly convenient because: • Ease of programming (essentially only one program). • Can scale easily to larger problem sizes. • Many numeric and some non-numeric problems can be cast in a data parallel form. Has been used in vector supercomputers designs in the 1970s. Versions seen in Intel processors, SSE extensions Currently used a basis of GPU operations, see later. 6c.2
3
Example To add the same constant to each element of an array:
for (i = 0; i < n; i++) a[i] = a[i] + k; Statement a[i] = a[i] + k; could be executed simultaneously by multiple processors, each using a different index i (0<i<=n). Vector supercomputers were designed to operate this way with single instruction multiple data model (SIMD) 6c.3
4
Using forall construct for data parallel pattern
Could use forall to specify data parallel operations forall (i = 0; i < n; i++) a[i] = a[i] + k However, forall is more general – it states that the n instances of the body can be executed simultaneously or in any order (not necessarily executed at the same time). We shall see that a GPU implementation of data parallel patterns does not necessarily allow all instances to execute at the same time. Note forall does imply synchronism at its end – all instances must complete before continuing, which will be true in GPUs 6.4
5
Data Parallel Example Prefix Sum Problem
Given a list of numbers, x0, …, xn-1, compute all the partial summations, i.e.: x0 + x1; x0 + x1 + x2; x0 + x1 + x2 + x3; x0 + x1 + x2 + x3 + x4; … Can also be defined with associative operations other than addition. Widely studied. Practical applications in areas such as processor allocation, data compaction, sorting, and polynomial evaluation. 6.5
6
Data parallel method for prefix sum operation
6.6
7
Parallel code using forall notation
Sequential code for (j = 0, j < log(n); j++) // at each step for (i = 2j; i < n; i++) // accumulate sum x[i] = x[i] + x[i + 2j]; Parallel code using forall notation for (j=0, j< log(n); j++) // at each step forall (i = 0; i < n; i++) // accumulate sum if (i >= 2j) x[i] = x[i] + x[i + 2j]; 6c.7
8
Matrix Multiplication
Easy to make a data parallel version Change for’s to forall’s: forall (i = 0; i < n; i++) // for each row of A forall (j = 0; j < n; j++) { // for each column of B c[i][j] = 0; for (k = 0; k < n; k++) c[i][j] = c[i][j] + a[i][k] * b[k][j]; } Here the data parallel definition extended to multiple sequential operations on data items – each instance of the body is a separate thread Each instance executed in sequential order 6c.8
9
We will explore the data parallel pattern using GPUs for high performance computing, see next.
Questions so far 6.9
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.