1/16 CALCULATING PREFIX SUMS Vladimir Jocovi ć 2012/0011.

Slides:



Advertisements
Similar presentations
Order of complexity. Consider four algorithms 1.The naïve way of adding the numbers up to n 2.The smart way of adding the numbers up to n 3.A binary search.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Part 2.  Arrays  Functions  Passing Variables in a URL  Passing variables with forms  Sessions.
Analysis of Algorithms
Time Analysis Since the time it takes to execute an algorithm usually depends on the size of the input, we express the algorithm's time complexity as a.
Linear Sorts Counting sort Bucket sort Radix sort.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
Overview What is Dynamic Programming? A Sequence of 4 Steps
Parallel Programming – OpenMP, Scan, Work Complexity, and Step Complexity David Monismith CS599 Based upon notes from GPU Gems 3, Chapter
Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
CS 240A: Parallel Prefix Algorithms or Tricks with Trees
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
© 2004 Goodrich, Tamassia QuickSort1 Quick-Sort     29  9.
Software Design Analysis of Algorithms i206 Fall 2010 John Chuang Some slides adapted from Glenn Brookshear, Marti Hearst, or Goodrich & Tamassia.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Parallel Prefix Sum (Scan) GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.
Chapter 2: Algorithm Analysis Application of Big-Oh to program analysis Running Time Calculations Lydia Sinapova, Simpson College Mark Allen Weiss: Data.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
S: Application of quicksort on an array of ints: partitioning.
The Design and Analysis of Algorithms
CS 206 Introduction to Computer Science II 10 / 28 / 2009 Instructor: Michael Eckmann.
Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.
25 November 2014Birkbeck College1 Introduction to Computer Systems Lecturer: Steve Maybank Department of Computer Science and Information Systems
Data Compression1 File Compression Huffman Tries ABRACADABRA
1 Sorting in O(N) time CS302 Data Structures Section 10.4.
Chapter 10 Applications of Arrays and Strings. Chapter Objectives Learn how to implement the sequential search algorithm Explore how to sort an array.
26 Sep 2014Lecture 3 1. Last lecture: Experimental observation & prediction Cost models: Counting the number of executions of Every single kind of command.
100 Solve 3 X 7 = ? using base 10 blocks Build a 3 X 7 array! X.
Array Cs212: DataStructures Lab 2. Array Group of contiguous memory locations Each memory location has same name Each memory location has same type a.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 12: Application Lessons When the tires.
Lecture 3 Induction & Sort(1) Algorithm design techniques: induction Selection sort, Insertion sort, Shell sort...
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Recursion Trees1 Recursion is a concept of defining a method that makes a call to itself.
Basic Data Structures Stacks. A collection of objects Objects can be inserted into or removed from the collection at one end (top) First-in-last-out.
© David Kirk/NVIDIA, Wen-mei W. Hwu, and John Stratton, ECE 498AL, University of Illinois, Urbana-Champaign 1 CUDA Lecture 7: Reductions and.
CS 193G Lecture 5: Parallel Patterns I. Getting out of the trenches So far, we’ve concerned ourselves with low-level details of kernel programming Mapping.
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
© David Kirk/NVIDIA and Wen-mei W. Hwu, University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 11 Parallel Computation.
Homework #2: Functions and Arrays By J. H. Wang Mar. 24, 2014.
Sorting 1. Insertion Sort
Comparison Networks Sorting Sorting binary values Sorting arbitrary numbers Implementing symmetric functions.
QuickSort Choosing a Good Pivot Design and Analysis of Algorithms I.
Algorithms. Algorithm (1) Read 8-digit integers from a file to an array Verify which of these numbers are real dates, which are not Don’t forget leap.
Higher Computing Science 2016 Prelim Revision. Topics to revise Computational Constructs parameter passing (value and reference, formal and actual) sub-programs/routines,
1 Algorithms CSCI 235, Fall 2015 Lecture 39 Final Exam Review.
GPGPU: Parallel Reduction and Scan Joseph Kider University of Pennsylvania CIS Fall 2011 Credit: Patrick Cozzi, Mark Harris Suresh Venkatensuramenan.
THRESHOLDING (IMAGE PROCESSING) Filip Vuković 2012/0205.
THRESHOLDING (IMAGE PROCESSING) Filip Vuković 2012/0205.
1/4 CALCULATING PREFIX SUMS Vladimir Jocovi ć 2012/0011.
Lecture 3 Sorting and Selection. Comparison Sort.
Chapter 3: Sorting and Searching Algorithms 3.1 Searching Algorithms.
1/16 CALCULATING PREFIX SUMS Vladimir Jocovi ć 2012/0011.
Introduction to Algorithm Complexity Bit Sum Problem.
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
© David Kirk/NVIDIA and Wen-mei W. Hwu, University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 12 Parallel Computation.
Warm Up Compute the following by using long division.
More on Recursion.
Partial Products Algorithm for Multiplication
Sorting in linear time Idea: if we can assume there are only k possible values to sort, we have extra information about where each element might need.
Notes Over 2.1 Function {- 3, - 1, 1, 2 } { 0, 2, 5 }
Data Structures & Algorithms
Mattan Erez The University of Texas at Austin
© 2012 Elsevier, Inc. All rights reserved.
A graphing calculator is required for some problems or parts of problems 2000.
ECE408 Applied Parallel Programming Lecture 14 Parallel Computation Patterns – Parallel Prefix Sum (Scan) Part-2 © David Kirk/NVIDIA and Wen-mei W.
Chapter 7 Functions and Graphs.
ECE 498AL Lecture 15: Reductions and Their Implementation
Presentation transcript:

1/16 CALCULATING PREFIX SUMS Vladimir Jocovi ć 2012/0011

2/16 WHAT ARE ALL-PREFIX SUMS? The all-prefix-sums operation takes: The all-prefix-sums operation takes: a binary associative operator ⊕ a binary associative operator ⊕ an ordered set of n elements [a0, a1,..., an − 1] an ordered set of n elements [a0, a1,..., an − 1] And returns the ordered set And returns the ordered set [a0, (a0 ⊕ a1),..., (a0 ⊕ a1 ⊕... ⊕ an − 1)] [a0, (a0 ⊕ a1),..., (a0 ⊕ a1 ⊕... ⊕ an − 1)] Inclusive type Inclusive type

3/16 WHAT ARE ALL-PREFIX SUMS? Example: Example: Operation ⊕ is addition Operation ⊕ is addition Input array - [3, 1, 7, 0, 4, 1, 6, 3] Input array - [3, 1, 7, 0, 4, 1, 6, 3] Would return: Would return: Output array - [3, 4, 11, 11, 15, 16, 22, 25] Output array - [3, 4, 11, 11, 15, 16, 22, 25]

4/16 WHERE ARE ALL-PREFIX SUMS USED? To lexically compare strings of characters To lexically compare strings of characters To evaluate polynomials To evaluate polynomials Sorting algorithms (radix sort, quicksort) Sorting algorithms (radix sort, quicksort)

5/16 HOW DOES THE HARDWARE LOOK LIKE? Graph representing PrefixSumKernel Io.input(“x”, type, …Io.output(“z”, type, … result = x + (cnt < loopVal?0:sum); Storing partial sum

6/16 HOW DOES THE HARDWARE LOOK LIKE? Graph representing PrefixSumKernel at its final step

7/16 HOW DOES THE HARDWARE LOOK LIKE? Manager graph

8/16 ALGORITHM

9/16 ALGORITHM

10/16 ALGORITHM

11/16 ALGORITHM

12/16 KERNEL CODE

13/16 BUILD AND RUN

14/16 CONCLUSION Poor maxeler results? Poor maxeler results? Just a simulation, not a real hardware Just a simulation, not a real hardware

15/16 REFERENCES - Delft University of Technology, Netherlands - Delft University of Technology, Netherlands Carnegie Mellon University, USA - Carnegie Mellon University, USA Oxford e-Research Centre - Oxford e-Research Centre Wikipedia - Wikipedia

16/16 QUESTIONS AND ANSWERS