CS 284a, 29 October 1997 Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 29 October, 1997
CS 284a, 29 October 1997 Copyright (c) , John Thornley2 Multithreaded Sorting: The Problem with Quicksort and Mergesort Sequential partition or merge limits speedup. Speedup(n, p) = (p log 2 (n))/(2p log 2 (n/p)). Speedup(n, ) = log 2 (n)/2. Example:Speedup(100 million, 2) = 1.9 (96%), Speedup(100 million, 4) = 3.5 (88%), Speedup(100 million, 8) = 5.7 (71%), Speedup(100 million, 16) = 8.1 (51%), Speedup(100 million, 32) = 10.2 (32%), Speedup(100 million, 64) = 11.6 (18%).
CS 284a, 29 October 1997 Copyright (c) , John Thornley3 The PSRS Algorithm (Parallel Sorting by Regular Sampling) Basic idea: –Split data into k equal-sized segments. –Sort segments concurrently (e.g., using quicksort). –Parallel k-way merge of sorted segments. Another name: “one-deep parallel mergesort”. Key algorithm is parallel k-way merge. Complexity O(n/p log(n)) for n > p 3. Fastest general-purpose parallel sorting algorithm.
CS 284a, 29 October 1997 Copyright (c) , John Thornley4 PSRS Algorithm void PSRS(int n, item data[], item result[], int k); /* Precondition: */ /* k >= 1 and n >= 2*k*k and */ /*data[0.. n - 1] allocated and result[0.. n - 1] allocated. */ /* Postcondition:*/ /* ascending(result[0.. n - 1]) and*/ /* permutation(result[0.. n - 1], in data[0.. n - 1]).*/
CS 284a, 29 October 1997 Copyright (c) , John Thornley5 Step 1: Divide the Data into Segments data data Sequential complexity: O(k)
CS 284a, 29 October 1997 Copyright (c) , John Thornley6 Step 2: In Parallel, Sort the Data Segments data data sequential quicksort sequential quicksort sequential quicksort Sequential complexity: O(n log 2 (n/k))
CS 284a, 29 October 1997 Copyright (c) , John Thornley7 Step 3: Take Evenly-Spaced Sample Points From the Sorted Data Segments Sequential complexity: O(2k 2 ) data sample 2k 2 sample points
CS 284a, 29 October 1997 Copyright (c) , John Thornley8 Step 4: Sort the Data Sample sample sequential quicksort Sequential complexity: O(2k 2 log 2 (2k 2 )) sample
CS 284a, 29 October 1997 Copyright (c) , John Thornley9 Step 5: Choose Evenly-Spaced Pivots From the Sorted Data Sample sample 369 pivots Sequential complexity: O(k)
CS 284a, 29 October 1997 Copyright (c) , John Thornley10 Step 6: Partition the Sorted Data Segments Using the Pivots data 369 pivots Sequential complexity: O(k 2 log 2 (n/k)) 3 6 9
CS 284a, 29 October 1997 Copyright (c) , John Thornley11 Step 7: Compute the Sizes of the Result Partitions data = = = 15 result Sequential complexity: O(k 2 )
CS 284a, 29 October 1997 Copyright (c) , John Thornley12 Step 8: In Parallel, Merge the Partitioned Data Segments into the Result Partitions data sequential k-way merge result sequential k-way merge sequential k-way merge Sequential complexity: O(n log 2 (k))
CS 284a, 29 October 1997 Copyright (c) , John Thornley13 Overall Sequential Complexity Step 1: O(k) - Divide data into segments. Step 2: O(n log 2 (n/k)) - Sort data segments. Step 3: O(2k 2 ) - Sample sorted data segments. Step 4: O(2k 2 log 2 (2k 2 )) - Sort data sample. Step 5: O(k) - Choose pivots from sorted data sample. Step 6: O(k 2 log 2 (n/k))- Partition sorted data segments. Step 7: O(k 2 ) - Compute result partition sizes. Step 8: O(n log 2 (k)) - Merge data into result partitions. Dominant terms (2k 2 << n) : O(n log 2 (n/k)) + O(n log 2 (k)). For fixed k, as n , complexity O(n log 2 (n)).
CS 284a, 29 October 1997 Copyright (c) , John Thornley14 Multithreaded PSRS void PSRS(int n, item data[], item result, int k, int t); /* Precondition: */ /* k >= 1 and n >= 2*k*k and */ /*data[0.. n - 1] allocated and result[0.. n - 1] allocated and*/ /*t >= 1.*/ /* Postcondition:*/ /* ascending(result[0.. n - 1]) and*/ /* permutation(result[0.. n - 1], in data[0.. n - 1]).*/ Extra argument: t is the number of threads used. If t > k, k threads are used.
CS 284a, 29 October 1997 Copyright (c) , John Thornley15 Multithreaded PSRS Algorithm Every step can be t-way multithreaded. Important steps to multithread: –Step 2: Sort data segments. –Step 8: Merge data into result partitions. Complexity: O(n/t log 2 (n/k)) + O(n/t log 2 (k)). For fixed k, as n , complexity O(n/t log 2 (n)).
CS 284a, 29 October 1997 Copyright (c) , John Thornley16 Multithreaded Performance Issues Load Balance: –How evenly-sized will the partitions be? –What is the data is not uniformly distributed? –What if there are lots of duplicates in the data? –Can we solve load balancing by having k > t? Algorithm Overhead: –How does sequential performance compare with quicksort? –How does sequential performance depend on k? Multithreading: –What is the cost of thread creation? Should we use barriers? –What are the cache/memory access issues?
CS 284a, 29 October 1997 Copyright (c) , John Thornley17 Costs of Partitioning Two dominant performance terms: –Step 2: O(n log 2 (n/k)) - Sort data segments. –Step 8: O(n log 2 (k)) - Merge data into result partitions. As k increases, step 2 cost decreases. As k increases, step 8 cost increases. Both extremes (k = 1, k = n) are O(n log 2 (n)). Overall effect depends on constant multipliers.