Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley1 CS 284a Lecture Tuesday, 29 October, 1997.

Similar presentations


Presentation on theme: "CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley1 CS 284a Lecture Tuesday, 29 October, 1997."— Presentation transcript:

1 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley1 CS 284a Lecture Tuesday, 29 October, 1997

2 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley2 Multithreaded Sorting: The Problem with Quicksort and Mergesort Sequential partition or merge limits speedup. Speedup(n, p) = (p log 2 (n))/(2p - 2 + log 2 (n/p)). Speedup(n,  ) = log 2 (n)/2. Example:Speedup(100 million, 2) = 1.9 (96%), Speedup(100 million, 4) = 3.5 (88%), Speedup(100 million, 8) = 5.7 (71%), Speedup(100 million, 16) = 8.1 (51%), Speedup(100 million, 32) = 10.2 (32%), Speedup(100 million, 64) = 11.6 (18%).

3 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley3 The PSRS Algorithm (Parallel Sorting by Regular Sampling) Basic idea: –Split data into k equal-sized segments. –Sort segments concurrently (e.g., using quicksort). –Parallel k-way merge of sorted segments. Another name: “one-deep parallel mergesort”. Key algorithm is parallel k-way merge. Complexity  O(n/p log(n)) for n > p 3. Fastest general-purpose parallel sorting algorithm.

4 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley4 PSRS Algorithm void PSRS(int n, item data[], item result[], int k); /* Precondition: */ /* k >= 1 and n >= 2*k*k and */ /*data[0.. n - 1] allocated and result[0.. n - 1] allocated. */ /* Postcondition:*/ /* ascending(result[0.. n - 1]) and*/ /* permutation(result[0.. n - 1], in data[0.. n - 1]).*/

5 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley5 Step 1: Divide the Data into Segments 6 3 7 3 8 9 5 1 3 9 0 4 7 4 3 7 8 2 4 1 0 6 3 7 8 3 9 3 6 5 8 9 0 4 3 2 5 1 5 7 3 9 4 2 7 data 6 3 7 3 8 9 5 1 3 9 0 4 7 4 3 7 8 2 4 1 0 6 3 7 8 3 9 3 6 5 8 9 0 4 3 2 5 1 5 7 3 9 4 2 7 data Sequential complexity: O(k)

6 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley6 Step 2: In Parallel, Sort the Data Segments 6 3 7 3 8 9 5 1 3 9 0 4 7 4 3 7 8 2 4 1 0 6 3 7 8 3 9 3 6 5 8 9 0 4 3 2 5 1 5 7 3 9 4 2 7 data 0 1 3 3 3 3 4 4 5 6 7 7 8 9 9 0 1 2 3 3 3 4 5 6 6 7 7 8 8 9 0 1 2 2 3 3 4 4 5 5 7 7 8 9 9 data sequential quicksort sequential quicksort sequential quicksort Sequential complexity: O(n log 2 (n/k))

7 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley7 Step 3: Take Evenly-Spaced Sample Points From the Sorted Data Segments Sequential complexity: O(2k 2 ) 0 1 3 3 3 3 4 4 5 6 7 7 8 9 9 0 1 2 3 3 3 4 5 6 6 7 7 8 8 9 0 1 2 2 3 3 4 4 5 5 7 7 8 9 9 data 0 3 3 6 7 9 0 3 3 6 7 9 0 3 3 5 7 9 sample 2k 2 sample points

8 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley8 Step 4: Sort the Data Sample 0 3 3 6 7 9 0 3 3 6 7 9 0 3 3 5 7 9 sample sequential quicksort 0 0 0 3 3 3 3 3 3 5 6 6 7 7 7 9 9 9 Sequential complexity: O(2k 2 log 2 (2k 2 )) sample

9 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley9 Step 5: Choose Evenly-Spaced Pivots From the Sorted Data Sample 0 0 0 3 3 3 3 3 3 5 6 6 7 7 7 9 9 9 sample 369 pivots Sequential complexity: O(k)

10 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley10 Step 6: Partition the Sorted Data Segments Using the Pivots 0 1 3 3 3 3 4 4 5 6 7 7 8 9 9 0 1 2 3 3 3 4 5 6 6 7 7 8 8 9 0 1 2 2 3 3 4 4 5 5 7 7 8 9 9 data 369 pivots Sequential complexity: O(k 2 log 2 (n/k))  3  6  9

11 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley11 Step 7: Compute the Sizes of the Result Partitions data 0 1 3 3 3 3 4 4 5 6 7 7 8 9 9 0 1 2 3 3 3 4 5 6 6 7 7 8 8 9 0 1 2 2 3 3 4 4 5 5 7 7 8 9 9 645645645 6 + 6 + 6 = 184 + 4 + 4 = 125 + 5 + 5 = 15 result Sequential complexity: O(k 2 )

12 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley12 Step 8: In Parallel, Merge the Partitioned Data Segments into the Result Partitions 0 1 3 3 3 3 4 4 5 6 7 7 8 9 9 0 1 2 3 3 3 4 5 6 6 7 7 8 8 9 0 1 2 2 3 3 4 4 5 5 7 7 8 9 9 data sequential k-way merge result 0 0 0 1 1 1 2 2 sequential k-way merge sequential k-way merge 4 4 4 4 4 5 57 7 7 7 7 7 8 8 Sequential complexity: O(n log 2 (k))

13 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley13 Overall Sequential Complexity Step 1: O(k) - Divide data into segments. Step 2: O(n log 2 (n/k)) - Sort data segments. Step 3: O(2k 2 ) - Sample sorted data segments. Step 4: O(2k 2 log 2 (2k 2 )) - Sort data sample. Step 5: O(k) - Choose pivots from sorted data sample. Step 6: O(k 2 log 2 (n/k))- Partition sorted data segments. Step 7: O(k 2 ) - Compute result partition sizes. Step 8: O(n log 2 (k)) - Merge data into result partitions. Dominant terms (2k 2 << n) : O(n log 2 (n/k)) + O(n log 2 (k)). For fixed k, as n  , complexity  O(n log 2 (n)).

14 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley14 Multithreaded PSRS void PSRS(int n, item data[], item result, int k, int t); /* Precondition: */ /* k >= 1 and n >= 2*k*k and */ /*data[0.. n - 1] allocated and result[0.. n - 1] allocated and*/ /*t >= 1.*/ /* Postcondition:*/ /* ascending(result[0.. n - 1]) and*/ /* permutation(result[0.. n - 1], in data[0.. n - 1]).*/ Extra argument: t is the number of threads used. If t > k, k threads are used.

15 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley15 Multithreaded PSRS Algorithm Every step can be t-way multithreaded. Important steps to multithread: –Step 2: Sort data segments. –Step 8: Merge data into result partitions. Complexity: O(n/t log 2 (n/k)) + O(n/t log 2 (k)). For fixed k, as n  , complexity  O(n/t log 2 (n)).

16 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley16 Multithreaded Performance Issues Load Balance: –How evenly-sized will the partitions be? –What is the data is not uniformly distributed? –What if there are lots of duplicates in the data? –Can we solve load balancing by having k > t? Algorithm Overhead: –How does sequential performance compare with quicksort? –How does sequential performance depend on k? Multithreading: –What is the cost of thread creation? Should we use barriers? –What are the cache/memory access issues?

17 CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley17 Costs of Partitioning Two dominant performance terms: –Step 2: O(n log 2 (n/k)) - Sort data segments. –Step 8: O(n log 2 (k)) - Merge data into result partitions. As k increases, step 2 cost decreases. As k increases, step 8 cost increases. Both extremes (k = 1, k = n) are O(n log 2 (n)). Overall effect depends on constant multipliers.


Download ppt "CS 284a, 29 October 1997 Copyright (c) 1997-98, John Thornley1 CS 284a Lecture Tuesday, 29 October, 1997."

Similar presentations


Ads by Google