Sorting and Searching Tim Purcell NVIDIA.

Slides:



Advertisements
Similar presentations
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University.
Advertisements

Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Chapter 11 Sorting and Searching. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Examine the linear search and.
Sorting and Searching Timothy J. PurcellStanford / NVIDIA Updated Gary J. Katz based on GPUTeraSort (MSR TR )U. of Pennsylvania.
Database Operations on GPU Changchang Wu 4/18/2007.
Mapping Computational Concepts to GPU’s Jesper Mosegaard Based primarily on SIGGRAPH 2004 GPGPU COURSE and Visualization 2004 Course.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Cg Programming Mapping Computational Concepts to GPUs.
Photon Mapping on Programmable Graphics Hardware
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Boolean Operations on Surfel-Bounded Solids Using Programmable Graphics Hardware Bart AdamsPhilip Dutré Katholieke Universiteit Leuven.
Chapter 8 Sorting and Searching Goals: 1.Java implementation of sorting algorithms 2.Selection and Insertion Sorts 3.Recursive Sorts: Mergesort and Quicksort.
1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
Chapter 9: Sorting1 Sorting & Searching Ch. # 9. Chapter 9: Sorting2 Chapter Outline  What is sorting and complexity of sorting  Different types of.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Merge Sort Comparison Left Half Data Movement Right Half Sorted.
Priority Queues and Heaps Tom Przybylinski. Maps ● We have (key,value) pairs, called entries ● We want to store and find/remove arbitrary entries (random.
Searching and Sorting Searching algorithms with simple arrays
Prof. U V THETE Dept. of Computer Science YMA
Chapter 16: Searching, Sorting, and the vector Type
Chapter 9: Sorting and Searching Arrays
Sorting With Priority Queue In-place Extra O(N) space
NEW SORTING ALGORITHMS
Design & Analysis of Algorithm Priority Queue
Data Structures & Algorithms
Graphics Processing Unit
CS 2511 Fall 2014 Binary Heaps.
Real-Time Ray Tracing Stefan Popov.
Searching & Sorting "There's nothing hidden in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting.
Hashing Exercises.
Heapsort.
CS451Real-time Rendering Pipeline
Timothy J. Purcell Stanford / NVIDIA
Parallel Sorting Algorithms
Divide and Conquer.
Lecture 16: Parallel Algorithms I
Priority Queue & Heap CSCI 3110 Nan Chen.
Fundamentals of Python: From First Programs Through Data Structures
Priority Queues.
Unit-2 Divide and Conquer
Static Image Filtering on Commodity Graphics Processors
Priority Queues.
8/04/2009 Many thanks to David Sun for some of the included slides!
Outline This topic covers Prim’s algorithm:
Parallel Sorting Algorithms
Sub-Quadratic Sorting Algorithms
Dr.Surasak Mungsing CSE 221/ICT221 Analysis and Design of Algorithms Lecture 05-2: Analysis of time Complexity of Priority.
Analysis of Algorithms
Fundamentals of Python: From First Programs Through Data Structures
Heapsort.
Searching/Sorting/Searching
Priority Queues CSE 373 Data Structures.
Parallel Sorting Algorithms
Data Structures for Shaping and Scheduling
A Heap Is Efficiently Represented As An Array
Module 8 – Searching & Sorting Algorithms
Applications of Arrays
CMPT 225 Lecture 16 – Heap Sort.
Presentation transcript:

Sorting and Searching Tim Purcell NVIDIA

Topics Sorting Search Sorting networks Binary search Nearest neighbor search

Assumptions Data organized into 1D arrays Rendering pass == screen aligned quad Not using vertex shaders PS 2.0 GPU No data dependent branching at fragment level

Sorting

Sorting Given an unordered list of elements, produce list ordered by key value Kernel: compare and swap GPUs constrained programming environment limits viable algorithms Bitonic merge sort [Batcher 68] Periodic balanced sorting networks [Dowd 89]

Bitonic Merge Sort Overview Repeatedly build bitonic lists and then sort them Bitonic list is two monotonic lists concatenated together, one increasing and one decreasing. List A: (3, 4, 7, 8) monotonically increasing List B: (6, 5, 2, 1) monotonically decreasing List AB: (3, 4, 7, 8, 6, 5, 2, 1) bitonic

Bitonic Merge Sort 3 7 4 8 6 2 1 5 8x monotonic lists: (3) (7) (4) (8) (6) (2) (1) (5) 4x bitonic lists: (3,7) (4,8) (6,2) (1,5)

Bitonic Merge Sort 3 7 4 8 6 2 1 5 Sort the bitonic lists

Bitonic Merge Sort 3 3 7 7 4 8 8 4 6 2 2 6 1 5 5 1 4x monotonic lists: (3,7) (8,4) (2,6) (5,1) 2x bitonic lists: (3,7,8,4) (2,6,5,1)

Bitonic Merge Sort 3 3 7 7 4 8 8 4 6 2 2 6 1 5 5 1 Sort the bitonic lists

Bitonic Merge Sort 3 3 3 7 7 4 4 8 8 8 4 7 6 2 5 2 6 6 1 5 2 5 1 1 Sort the bitonic lists

Bitonic Merge Sort 3 3 3 7 7 4 4 8 8 8 4 7 6 2 5 2 6 6 1 5 2 5 1 1 Sort the bitonic lists

Bitonic Merge Sort 3 3 3 3 7 7 4 4 4 8 8 7 8 4 7 8 6 2 5 6 2 6 6 5 1 5 2 2 5 1 1 1 2x monotonic lists: (3,4,7,8) (6,5,2,1) 1x bitonic list: (3,4,7,8, 6,5,2,1)

Bitonic Merge Sort 3 3 3 3 7 7 4 4 4 8 8 7 8 4 7 8 6 2 5 6 2 6 6 5 1 5 2 2 5 1 1 1 Sort the bitonic list

Bitonic Merge Sort 3 3 3 3 3 7 7 4 4 4 4 8 8 7 2 8 4 7 8 1 6 2 5 6 6 2 6 6 5 5 1 5 2 2 7 5 1 1 1 8 Sort the bitonic list

Bitonic Merge Sort 3 3 3 3 3 7 7 4 4 4 4 8 8 7 2 8 4 7 8 1 6 2 5 6 6 2 6 6 5 5 1 5 2 2 7 5 1 1 1 8 Sort the bitonic list

Bitonic Merge Sort 3 3 3 3 3 2 7 7 4 4 4 1 4 8 8 7 2 3 8 4 7 8 1 4 6 2 5 6 6 6 2 6 6 5 5 5 1 5 2 2 7 7 5 1 1 1 8 8 Sort the bitonic list

Bitonic Merge Sort 3 3 3 3 3 2 7 7 4 4 4 1 4 8 8 7 2 3 8 4 7 8 1 4 6 2 5 6 6 6 2 6 6 5 5 5 1 5 2 2 7 7 5 1 1 1 8 8 Sort the bitonic list

Bitonic Merge Sort 3 3 3 3 3 2 1 7 7 4 4 4 1 2 4 8 8 7 2 3 3 8 4 7 8 1 4 4 6 2 5 6 6 6 5 2 6 6 5 5 5 6 1 5 2 2 7 7 7 5 1 1 1 8 8 8 Done!

Bitonic Merge Sort Summary Separate rendering pass for each set of swaps O(log2n) passes Each pass performs n compare/swaps Total compare/swaps: O(n log2n) Limitations of GPU cost us factor of logn over best CPU-based sorting algorithms

Making GPU Sorting Faster Draw several quads with similar computation instead of single quad Reduce decision making in fragment program Push work into vertex processor and interpolator Reduce computation in fragment program More than one compare/swap per sort kernel invocation Reduce computational complexity

Grouping Computation 3 3 3 3 3 2 1 7 7 4 4 4 1 2 4 8 8 7 2 3 3 8 4 7 8 1 4 4 6 2 5 6 6 6 5 2 6 6 5 5 5 6 1 5 2 2 7 7 7 5 1 1 1 8 8 8

Implementation Details Specify interpolants for smaller quads ‘down’ or ‘up’ compare and swap distance to comparison partner See Kipfer & Westermann article in GPU Gems 2 and Kipfer et al. Graphics Hardware 04 for more details

GPU Sort Use blending operators for comparison Use texture mapping hw to map sorting op. [Govindaraju 05]

Searching

Types of Search Search for specific element Binary search Search for nearest element(s) k-nearest neighbor search Both searches require ordered data

Binary Search Find a specific element in an ordered list Implement just like CPU algorithm Assuming hardware supports long enough shaders Finds the first element of a given value v If v does not exist, find next smallest element > v Search algorithm is sequential, but many searches can be executed in parallel Number of pixels drawn determines number of searches executed in parallel 1 pixel == 1 search

Binary Search Search for v0 Search starts at center of sorted array Initialize 4 Search starts at center of sorted array v2 >= v0 so search left half of sub-array Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 Initialize 4 v0 >= v0 so search left half of sub-array Step 1 2 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 Initialize 4 v0 >= v0 so search left half of sub-array Step 1 2 Step 2 1 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 Initialize 4 At this point, we either have found v0 or are 1 element too far left One last step to resolve Step 1 2 Step 2 1 Step 3 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 Done! Initialize Step 1 Step 2 Step 3 4 Done! Step 1 2 Step 2 1 Step 3 Step 4 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 and v2 Initialize 4 4 Search starts at center of sorted array Both searches proceed to the left half of the array Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 and v2 Initialize 4 4 The search for v0 continues as before The search for v2 overshot, so go back to the right Step 1 2 2 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 and v2 Initialize 4 4 We’ve found the proper v2, but are still looking for v0 Both searches continue Step 1 2 2 Step 2 1 3 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 and v2 Initialize 4 4 Now, we’ve found the proper v0, but overshot v2 The cleanup step takes care of this Step 1 2 2 Step 2 1 3 Step 3 2 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Search for v0 and v2 Initialize 4 4 Done! Both v0 and v2 are located properly Step 1 2 2 Step 2 1 3 Step 3 2 Step 4 3 Sorted List v0 v0 v0 v2 v2 v2 v5 v5 1 2 3 4 5 6 7

Binary Search Summary Single rendering pass O(log n) steps Each pixel drawn performs independent search O(log n) steps

Nearest Neighbor Search

Nearest Neighbor Search Given a sample point p, find the k points nearest p within a data set On the CPU, this is easily done with a heap or priority queue Can add or reject neighbors as search progresses Don’t know how to build one efficiently on GPU kNN-grid Can only add neighbors…

kNN-grid Algorithm sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm Candidate neighbors must be within max search radius Visit voxels in order of distance to sample point sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm If current number of neighbors found is less than the number requested, grow search radius 1 sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm If current number of neighbors found is less than the number requested, grow search radius 2 sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm 2 Don’t add neighbors outside maximum search radius Don’t grow search radius when neighbor is outside maximum radius 2 sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm 3 Add neighbors within search radius sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm 4 Add neighbors within search radius sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm Don’t expand search radius if enough neighbors already found 4 sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm 5 Add neighbors within search radius sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Algorithm Visit all other voxels accessible within determined search radius Add neighbors within search radius 6 sample point candidate neighbor neighbors found Want 4 neighbors

kNN-grid Summary Finds all neighbors within a sphere centered about sample point May locate more than requested k-nearest neighbors 6 sample point candidate neighbor neighbors found Want 4 neighbors