Parallel prefix sum computation Lecture 7. 2 Prefix sum.

Slides:



Advertisements
Similar presentations
General algorithmic techniques: Balanced binary tree technique Doubling technique: List Ranking Problem Divide and concur Lecture 6.
Advertisements

Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
ADDER, HALF ADDER & FULL ADDER
1 Lecture 5 PRAM Algorithm: Parallel Prefix Parallel Computing Fall 2008.
STATISTICS.
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
Lecture 3: Parallel Algorithm Design
1 Parallel Parentheses Matching Plus Some Applications.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Parallel Prefix Computation Advanced Algorithms & Data Structures Lecture Theme 14 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
Lecture 261 Nodal Analysis. Lecture 262 Example: A Summing Circuit The output voltage V of this circuit is proportional to the sum of the two input currents.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 91 Single Node-Pair Circuits and Current Division.
Parallel Prefix Sum (Scan) GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.
Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a.
Communication [Lower] Bounds for Heterogeneous Architectures Julian Bui.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
The Euler-tour technique
מבנה מחשבים תרגול מספר 4. Definition: A Boolean function f :{0,1} n  {0,1} is monotone if x  y  f (x)  f ( y) (where x  y means : for every i x i.
PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad.
CSE 246: Computer Arithmetic Algorithms and Hardware Design Prof Chung-Kuan Cheng Lecture 3.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Defining Polynomials p 1 (n) is the bound on the length of an input pair p 2 (n) is the bound on the running time of f p 3 (n) is a bound on the number.
1 Appendix E: Sigma Notation. 2 Definition: Sequence A sequence is a function a(n) (written a n ) who’s domain is the set of natural numbers {1, 2, 3,
12-1 Arithmetic Sequences and Series. Sequence- A function whose domain is a set of natural numbers Arithmetic sequences: a sequences in which the terms.
The Rectangle Method. Introduction Definite integral (High School material): A definite integral a ∫ b f(x) dx is the integral of a function f(x) with.
High Performance Circuit Design By Prof. V. Kamakoti Department of Computer Science and Engineering Indian Institute of Technology, Madras Chennai – 600.
Humorous Asides “A journey begins with single step”
Arithmetic Sequences Standard: M8A3 e. Use tables to describe sequences recursively and with a formula in closed form.
Review of Sequences and Series.  Find the explicit and recursive formulas for the sequence:  -4, 1, 6, 11, 16, ….
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 03, 2005 Session 15.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
© David Kirk/NVIDIA and Wen-mei W. Hwu, University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 11 Parallel Computation.
Computer Science 101 More Devices: Arithmetic. From 1-Bit Equality to N-Bit Equality = A B A = B Two bit strings.
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
ICC Module 3 Lesson 1 – Computer Architecture 1 / 13 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 2 – Von Neumann.
Review of Sequences and Series
ADD To get next term Have a common difference Arithmetic Sequences Geometric Sequences MULTIPLY to get next term Have a common ratio.
Arithmetic Recursive and Explicit formulas I can write explicit and recursive formulas given a sequence. Day 2.
Week 5 Warm Up { -3, 8, 19, 30, 41, . . } n = 3 1) tn = 2) tn - 1 =
Lecture 3: Parallel Algorithm Design
PRAM Model for Parallel Computation
11.2 Arithmetic Sequences.
Conditional-Sum Adders Parallel Prefix Network Adders
CS2100 Computer Organisation
Topics Section 9.2 Complexity of algorithms Efficiency of algorithms
Parallel Algorithms (chap. 30, 1st edition)
Richard Anderson Lecture 26 NP-Completeness
CS 584 Lecture 3 How is the assignment going?.
Course Description Algorithms are: Recipes for solving problems.
PRAM Model for Parallel Computation
Array Processor.
Pipelined Computations
Parallel Computation Patterns (Scan)
Richard Anderson Lecture 28 NP-Completeness
Unit –VIII PRAM Algorithms.
Module 3 Arithmetic and Geometric Sequences
List Ranking Moon Jung Chung
Area Under a Curve Riemann Sums.
Course Description Algorithms are: Recipes for solving problems.
CSE 589 Applied Algorithms Spring 1999
Conditional-Sum Adders Parallel Prefix Network Adders
Presentation transcript:

Parallel prefix sum computation Lecture 7

2 Prefix sum

3

4 Assume that input is A[n], A[n+1],…,A[2n-1] The left and right son of a node i is now given by a simple formula 2 i and 2 i+ 1, respectively. Parallel prefix sums computation Phase1: for k=m-1 down to 0 do for all 2 k  j<2 k+1 in parallel do A[j]:=A[2j]+A[2j+1] B[0]:=A[1] Phase2: for k=0 to m do for all 2 k  j<2 k+1 in parallel do iff odd(j) then B[j]:=B[(j-1)/2] else B[j]:=B[j/2]-A[j+1] output table B[n…(2n-1)]

5 Recursive divide and concur approach PREF-SUMS(A[1..n/2],n/2) || PREF-SUMS(A[n/2..n],n/2)

6 Recursive divide and concur algorithm as arithmetic circuit

7 List prefix for each processor i do y[i]  x[i] while exist i | next[i]  NIL do for each processor i do if next[i]  NIL then y[next[i]]  y[i] + y[next[i]] next[i]  next[next[i]] y[i]= x[1]+ x[2]+…+ x[i]

8

9 The first phase

10 The second phase The second phase of the algorithm now assigns a processor to each of the (in general not ‘good’) intervals [l(i)..r(i)] and proceeds to find a decomposition of the interval into ‘good’ intervals. Lets consider decomposition of [1..7] into ‘good’ intervals, were ‘good’ intervals are enclosed in rectangles. [1..7] [1..4] [5..7] [7..7] [5..6] [7..7]