Introduction to CUDA Programming

Slides:



Advertisements
Similar presentations
List Ranking and Parallel Prefix
Advertisements

Advanced Topics in Algorithms and Data Structures
Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (
INTERVAL TREE & SEGMENTATION TREE
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Lecture 3: Parallel Algorithm Design
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Winner trees. Loser Trees.
TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
3.3 Spanning Trees Tucker, Applied Combinatorics, Section 3.3, by Patti Bodkin and Tamsen Hunter.
CS 171: Introduction to Computer Science II
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
SASH Spatial Approximation Sample Hierarchy
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 19: Heap Sort.
Parallel Prefix Sum (Scan) GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.
Induction of Decision Trees
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
The Euler-tour technique
Fundamentals of Python: From First Programs Through Data Structures
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 20: Binary Trees.
Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Lauren Milne Summer
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
Tree.
1 COP 3538 Data Structures with OOP Chapter 8 - Part 2 Binary Trees.
Introduction to CUDA Programming Scans Andreas Moshovos Winter 2009 Based on slides from: Wen Mei Hwu (UIUC) and David Kirk (NVIDIA) White Paper/Slides.
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
When we add or subtract integers we can use a number line to help us see what is happening with the numbers.
Heapsort CSC Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n)
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
Binary Trees. Binary Tree Finite (possibly empty) collection of elements A nonempty binary tree has a root element The remaining elements (if any) are.
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
Large Scale Assembly of DNA Strings using Suffix Trees David Rivshin Parallel 2 4/11/2001.
© David Kirk/NVIDIA, Wen-mei W. Hwu, and John Stratton, ECE 498AL, University of Illinois, Urbana-Champaign 1 CUDA Lecture 7: Reductions and.
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
Lecture 15 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
1 C++ Classes and Data Structures Jeffrey S. Childs Chapter 15 Other Data Structures Jeffrey S. Childs Clarion University of PA © 2008, Prentice Hall.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
1 Fat heaps (K & Tarjan 96). 2 Goal Want to achieve the performance of Fibonnaci heaps but on the worst case. Why ? Theoretical curiosity and some applications.
(2,4) Trees1 What are they? –They are search Trees (but not binary search trees) –They are also known as 2-4, trees.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Linda Shapiro Winter 2015.
CS261 Data Structures Binary Search Trees Concepts.
Chapter 7 Trees_ Part2 TREES. Depth and Height 2  Let v be a node of a tree T. The depth of v is the number of ancestors of v, excluding v itself. 
Partitioning in Quicksort n How do we partition the array efficiently? – choose partition element to be rightmost element – scan from right for smaller.
What is a Tree? Formally, we define a tree T as a set of nodes storing elements such that the nodes have a parent-child relationship, that satisfies the.
Parallel primitives – Scan operation CDP – Written by Uri Verner 1 GPU Algorithm Design.
Topological Sort In this topic, we will discuss: Motivations
Binary search tree. Removing a node
Btrees Insertion.
Ariel Rosenfeld Bar-Ilan Uni.
Lecture 22 Binary Search Trees Chapter 10 of textbook
Introduction to CUDA Programming
Heap Sort The idea: build a heap containing the elements to be sorted, then remove them in order. Let n be the size of the heap, and m be the number of.
Height Balanced Trees 2-3 Trees.
Dijkstra’s Algorithm We are given a directed weighted graph
© 2012 Elsevier, Inc. All rights reserved.
Trees Addenda.
Notes on Assignment 2 Object Delegation.
End Behavior.
Readahead FSMs, Readback FSMs, and Reduce States
CMPT 225 Lecture 16 – Heap Sort.
Presentation transcript:

Introduction to CUDA Programming Scan Algorithm Explained Andreas Moshovos Winter 2009 Introduction to CUDA Programming

Reading You are strongly encouraged to read the following as it a contains a more formal treatment of the algorithm, plus an overview of various applications of scan. Guy E. Blelloch. “Prefix Sums and Their Applications”. In John H. Reif (Ed.), Synthesis of Parallel Algorithms, Morgan Kaufmann, 1990. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/scandal/public/papers/CMU-CS-90-190.html

Up-Sweep Down-Sweep Essentially a reduction Two phases Up-Sweep Essentially a reduction Produces many partial results Down-Sweep Propagating the partial results to all relevant elements

Just a reduction: Up-Sweep 1 2 2 5 6 3 8 2 4 1 5 2 7 9 3 5 1 3 2 7 6 9 10 4 5 5 7 7 16 3 8 1 3 2 10 6 9 8 19 4 5 5 12 7 16 3 24 1 3 2 10 6 9 8 29 4 5 5 12 7 16 3 36 1 3 2 10 6 9 8 29 4 5 5 12 7 16 3 65

Now let’s see this is a tree Up-Sweep Now let’s see this is a tree 1 2 2 5 6 3 8 2 4 1 5 2 7 9 3 5 3 7 9 10 5 7 16 8 10 19 12 24 29 36 Notice we only have these nodes left in our array: the rest were partial results 65 1 3 2 10 6 9 8 29 4 5 5 12 7 16 3 65

Up-Sweep So, this is what’s left nodes without values don’t exist, they were partial results 1 2 6 8 4 5 7 3 3 9 5 16 10 12 29 65

For the second phase we need to think: Down-Sweep For the second phase we need to think: The edges in reverse The empty nodes as placeholders for partial results 1 2 6 8 4 5 7 3 3 9 5 16 10 12 29 65

Now let’s view the tree as a collection of nsubtrees Down-Sweep Now let’s view the tree as a collection of nsubtrees The root of each sub tree, where it’s still present contains the reduction of all subtree elements i.e., the sum of all subtree elements 1 2 6 8 4 5 7 3 3 9 5 16 10 12 29 65

Let’s focus on the rightmost subtree: Down-Sweep Let’s focus on the rightmost subtree: 1 2 6 8 4 5 7 3 3 9 5 16 10 12 29 65

Down-Sweep Before the last step of the down-sweep phase the yellow element will contain the sum (57) of all elements to the left of the subtree. 3 57 The last step will take the following two actions 3+ 57 = 60, this goes on the rightmost element This is the sum of all elements including 3 but excluding the right most one overwrite 3 with 57 This is the sum of all elements left of 3

Down-Sweep In terms of the array stored in memory the aforementioned actions look like this: 57 61 3 57 Where: the dark arrows represent addition the red dotted arrow represents a move

Down-Sweep Let’s now focus at the rightmost subtree that contains the last four nodes: This will be processed at the step before the previous subtree we just discussed 7 3 16

Down-Sweep Before the previous to the last step of the down-sweep phase the green element will contain the sum (41) of all elements to the left of the subtree. 7 3 16 41

The actions that will be taken at this step are: Down-Sweep The actions that will be taken at this step are: 16 + 41 = 57 will be written as the root of the rightmost subtree As we saw before this is the sum of all element left of the rightmost subtree 41 will replace 16 This is the sum of all elements left of the subtree rooted by 16 7 3 41 57 41

Down-Sweep In terms of the array stored in memory the aforementioned actions look like this: 7 41 3 57 7 16 3 41 Where: the dark arrows represent addition the red dotted arrow represents a move

Down-Sweep Now let’s go a step back looking at the complete right subtee (in green) 4 5 7 3 5 16 12

Down-Sweep Before this step the root node will contain the sum (29) of all elements of the left subtree 4 5 7 3 5 16 12 29

As before we’ll do two things: Down-Sweep As before we’ll do two things: 29+12 = 41 and this becomes the root of the rightmost subtree This should be the sum of all elements to the left of that subtree for the next step (which we saw previously) 29 replaces 12 4 5 7 3 same reason: 29 is the sum of all elements left of the subtree rooted by what was 12. 5 16 29 41 29

Down-Sweep Let’s try to generalize what happens at every step of the down-sweep phase Let’s look at step 1: There is only one subtree shown in purple 1 2 6 8 4 5 7 3 3 9 5 16 10 12 29 65

Down-Sweep Before we process this tree as described before the root node must contain the sum of all elements to the left of the tree There are no elements Hence the root must be 0 1 2 6 8 4 5 7 3 3 9 5 16 10 12 29

Now repeat the steps we saw before Down-Sweep Now repeat the steps we saw before 29 + 0 = 29 and this becomes the root of the right subtree 29 gets replaced by 0 1 2 6 8 4 5 7 3 3 9 5 16 10 12 29

Down-Sweep In terms of the array stored in memory the aforementioned actions look like this: 1 3 2 10 6 9 8 4 5 5 12 7 16 3 29 1 3 2 10 6 9 8 29 4 5 5 12 7 16 3 Where: the dark arrows represent addition the red dotted arrow represents a move