Parallel Patterns Reduce & Scan

Slides:



Advertisements
Similar presentations
Recursion 2014 Spring CS32 Discussion Jungseock Joo.
Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Garfield AP Computer Science
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
CS 240A: Parallel Prefix Algorithms or Tricks with Trees
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
CS421 - Course Information Website Syllabus Schedule The Book:
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Chapter 2: Algorithm Discovery and Design
CS 104 Introduction to Computer Science and Graphics Problems Data Structure & Algorithms (3) Recurrence Relation 11/11 ~ 11/14/2008 Yang Song.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Parallel Processing (CS526) Spring 2012(Week 5).  There are no rules, only intuition, experience and imagination!  We consider design techniques, particularly.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Lecture 2 We have given O(n 3 ), O(n 2 ), O(nlogn) algorithms for the max sub-range problem. This time, a linear time algorithm! The idea is as follows:
© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 11 Parallel.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
1 Sorting. 2 Introduction Why is it important Where to use it.
1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Sorting Quick, Merge & Radix Divide-and-conquer Technique subproblem 2 of size n/2 subproblem 1 of size n/2 a solution to subproblem 1 a solution to.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Searching and Sorting Searching algorithms with simple arrays
Advanced Sorting 7 2  9 4   2   4   7
Sorting.
Analysis of Algorithms CS 477/677
Introduction to Algorithms: Divide-n-Conquer Algorithms
Introduction to Algorithms
Fundamentals of Algorithms MCS - 2 Lecture # 11
Parallel Patterns.
Auburn University
Algorithms and Problem Solving
Lecture 3: Parallel Algorithm Design
Analysis of Algorithms
Decrease-and-Conquer Approach
GC211Data Structure Lecture2 Sara Alhajjam.
Unit 1. Sorting and Divide and Conquer
Pattern Parallel Programming
ECE408 Fall 2015 Applied Parallel Programming Lecture 11 Parallel Computation Patterns – Reduction Trees © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al,
Sorting.
CSE 143 Lecture 23: quick sort.
Adapted from slides by Marty Stepp and Stuart Reges
Growth Functions Algorithms Lecture 8
CS 3343: Analysis of Algorithms
Algorithm design and Analysis
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Objective of This Course
Sorting Algorithms Ellysa N. Kosinaya.
Pipelined Computations
Algorithm Discovery and Design
Introduction to High Performance Computing Lecture 12
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
Parallel Computation Patterns (Reduction)
Parallel Sorting Algorithms
CSE 373 Data Structures and Algorithms
Sub-Quadratic Sorting Algorithms
Algorithms and Problem Solving
UNIVERSITY OF MASSACHUSETTS Dept
CS 3343: Analysis of Algorithms
Analysis of Algorithms
Introduction to Spark.
UNIVERSITY OF MASSACHUSETTS Dept
CPS120: Introduction to Computer Science
CPS120: Introduction to Computer Science
5/7/2019 Map Reduce Map reduce.
Divide-and-conquer approach
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Data Structures and Algorithms CS 244
Presentation transcript:

Parallel Patterns Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Parallel Patterns Reduce & Scan

Programming Patterns For Parallelism 6/16/2010 Parallel Patterns - Reduce & Scan Programming Patterns For Parallelism Some patterns repeat in many different contexts e.g. Search an element in an array Identifying such patterns important Solve a problem once and reuse the solution Split a hard problem into individual problems Helps define interfaces

We Have Already Seen Some Patterns 6/16/2010 Parallel Patterns - Reduce & Scan We Have Already Seen Some Patterns

We Have Already Seen Some Patterns 6/16/2010 Parallel Patterns - Reduce & Scan We Have Already Seen Some Patterns Divide and Conquer Split a problem into n sub problems Recursively solve the sub problems And merge the solution Data Parallelism Apply the same function to all elements in a collection, array Parallel.For, Parallel.ForEach Also called as “map” in functional programming

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Map Given a function f : (A) => B A collection a: A[] Generates a collection b: B[], where B[i] = f( A[i] ) Parallel.For, Paralle.ForEach Where each loop iteration is independent A f B

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Reduce And Scan In practice, parallel loops have to work together to generate an answer Reduce and Scan patterns capture common cases of processing results of Map

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Reduce And Scan In practice, parallel loops have to work together to generate an answer Reduce and Scan patterns capture common cases of processing results of Map Note: Map and Reduce are similar to but not the same as MapReduce MapReduce is a framework for distributed computing

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Reduce Given a function f: (A, B) => B A collection a: A[] An initial value b0: B Generate a final value b: B Where b = f(A[n-1], … f(A[1], f(A[0], b0)) ) A f b b0

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Reduce Given a function f: (A, B) => B A collection a: A[] An initial value b0: B Generate a final value b: B Where b = f(A[n-1], … f(A[1], f(A[0], b0)) ) Only consider where A and B are the same type A f b b0

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Reduce B acc = b_0; for( i = 0; i < n; i++ ) { acc = f( a[i], acc ); } b = acc; A f b b0

Associativity of the Reduce function 6/16/2010 Parallel Patterns - Reduce & Scan Associativity of the Reduce function Reduce is parallelizable if f is associative f(a, f(b, c)) = f(f(a,b), c) E.g. Addition : (a + b) + c = a + (b + c) Where + is integer addition (with modulo arithmetic) But not when + is floating point addition

Associativity of the Reduce function 6/16/2010 Parallel Patterns - Reduce & Scan Associativity of the Reduce function Reduce is parallelizable if f is associative f(a, f(b, c)) = f(f(a,b), c) E.g. Addition : (a + b) + c = a + (b + c) Where + is integer addition (with modulo arithmetic) But not when + is floating point addition Max, min, multiply, … Set union, intersection,

We can use Divide and Conquer 6/16/2010 Parallel Patterns - Reduce & Scan We can use Divide and Conquer Reduce(f, A[1…n], b_0) = f ( Reduce(f, A[1..n/2], b_0), Reduce(f, A[n/2+1…n], I) ) where I is the identity element of f A f f f f f f f f b0 I f b

Implementation Optimizations 6/16/2010 Parallel Patterns - Reduce & Scan Implementation Optimizations Switch to sequential Reduce for the base k elements Do k way splits instead of two way splits Maintain a thread-local accumulated value A task updates the value of the thread it executes in

Implementation Optimizations 6/16/2010 Parallel Patterns - Reduce & Scan Implementation Optimizations Switch to sequential Reduce for the base k elements Do k way splits instead of two way splits Maintain a thread-local accumulated value A task updates the value of the thread it executes in Requires that the reduce function is also commutative f(a, b) = f(b, a)

Implementation Optimizations 6/16/2010 Parallel Patterns - Reduce & Scan Implementation Optimizations Switch to sequential Reduce for the base k elements Do k way splits instead of two way splits Maintain a thread-local accumulated value A task updates the value of the thread it executes in Requires that the reduce function is also commutative f(a, b) = f(b, a) Thread local values are then merged in a separate pass

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Scan Given a function f: (A, B) => B A collection a: A[] An initial value b0: B Generate a collection b: B[] Where b[i] = f(A[i-1], … f(A[1], f(A[0], b0)) ) A f b0

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Scan B acc = b_0; for( i = 0; i < n; i++ ) { acc = f( a[i], acc ); } A f b0

Scan is Efficiently Parallelizable 6/16/2010 Parallel Patterns - Reduce & Scan Scan is Efficiently Parallelizable When f is associative

Scan is Efficiently Parallelizable 6/16/2010 Parallel Patterns - Reduce & Scan Scan is Efficiently Parallelizable When f is associative Scan(f, A[1..n], b_0) = Scan(f, A[1..n/2], b_0), Scan(f, A[n/2+1…n], ____) A f f f f f f f f b0 ?

Scan is Efficiently Parallelizable 6/16/2010 Parallel Patterns - Reduce & Scan Scan is Efficiently Parallelizable When f is associative Scan(f, A[1..n], b_0) = Scan(f, A[1..n/2], b_0), Scan(f, A[n/2+1…n], Reduce(f, A[1..n/2], b_0)) A f f f f f f f f b0 ?

Scan is useful in many places 6/16/2010 Parallel Patterns - Reduce & Scan Scan is useful in many places Radix Sort Ray Tracing …

Scan is useful in many places 6/16/2010 Parallel Patterns - Reduce & Scan Scan is useful in many places Radix Sort (  ) Ray Tracing …

Computing Line of Sight 6/16/2010 Parallel Patterns - Reduce & Scan Computing Line of Sight Given x1, … xn with altitudes a[1],…a[n] Which of the points are visible from x0

Computing Line of Sight 6/16/2010 Parallel Patterns - Reduce & Scan Computing Line of Sight Given x0, … xn with altitudes alt[0],…alt[n] Which of the points are visible from x0 angle[i] = arctan( (alt[i] – alt[0]) / i ) xi is visible from x0 if all points between them have lesser angle than angle[i]

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Solution

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Radix Sort 5 = 101 7 = 111 2 = 010 4 = 100 3 = 011 1 = 001

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Radix Sort 5 = 101 7 = 111 2 = 010 4 = 100 3 = 011 1 = 001 2 = 010 4 = 100 5 = 101 7 = 111 3 = 011 1 = 001

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Radix Sort 5 = 101 7 = 111 2 = 010 4 = 100 3 = 011 1 = 001 2 = 010 4 = 100 5 = 101 7 = 111 3 = 011 1 = 001 4 = 100 5 = 101 1 = 001 2 = 010 7 = 111 3 = 011

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Radix Sort 5 = 101 7 = 111 2 = 010 4 = 100 3 = 011 1 = 001 2 = 010 4 = 100 5 = 101 7 = 111 3 = 011 1 = 001 4 = 100 5 = 101 1 = 001 2 = 010 7 = 111 3 = 011 1 = 001 2 = 010 3 = 011 4 = 100 5 = 101 7 = 111

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Basic Primitive: Pack Given an array A and an array F of flags A = [5 7 2 4 5 3 1] F = [1 1 0 0 1 1 1] Pack all elements with flag = 0 before elements with flag = 1 A’ = [2 4 5 7 5 3 1]

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan Solution

Other Applications of Scan 6/16/2010 Parallel Patterns - Reduce & Scan Other Applications of Scan Radix Sort Computing Line of Sight Adding multi-precision numbers Quick Sort To search for regular expressions Parallel grep …

Parallel Patterns - Reduce & Scan 6/16/2010 Parallel Patterns - Reduce & Scan High Level Points Minimize dependence between parallel loops Unintended dependences = data races Next lecture Carefully analyze remaining dependences Use Reduce and Scan patterns where applicable