Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington Fast Sorting COMP 103 2012.

Similar presentations


Presentation on theme: "Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington Fast Sorting COMP 103 2012."— Presentation transcript:

1 Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington Fast Sorting COMP 103 2012 T2 #16

2 COMP103 16:2 Menu Sorting Design by Divide and Conquer Merge Sort QuickSort Notes: No lecture friday. Terms Test 1 available Tutorial changes this week.

3 COMP103 16:3 Insertion sort, Selection Sort, Bubble Sort: All slow (except Insertion sort on almost sorted lists) O(n 2 ) Problem: Insertion and Bubble only compare adjacent items only move items one step at a time Selection compares every pair of items – ignores results of previous comparisons. Solution: Must compare and swap items at a distance Must not perform redundant comparisons Slow Sorts:

4 COMP103 16:4 Divide and Conquer Sorts To Sort: Split Sort each part (recursive) Combine Where does the work happen? MergeSort: split trivial combine does all the work QuickSort: split does all the work combine trivial Array Sorted Array Split Combine SubArray SortedSubArray Sort Split Combine SubArray Sort SortedSubArray Split Combine SubArray Sort SortedSubArray

5 COMP103 16:5 Merge Sort Split the array exactly in half Sort each half “Merge” them together. 67891011012345 012345678910 11 Temporary array

6 COMP103 16:6 MergeSort Needs a temporary array for copying create temporary array [fill with a copy of the original data.] Need a "wrapper" method to start it off. public static void mergeSort(E[] data, int size, Comparator comp){ E[] other = (E[])new Object[size]; for (int i=0; i<size; i++) other[i]=data[i]; mergeSort(data, other, 0, size, comp); } Not needed for simple version

7 COMP103 16:7 MergeSort private static void mergeSort(E[] data, E[] temp, int low, int high, Comparator comp){ // sort items from low..high-1 using temp array if (low < high-1){ int mid = (low+high)/2; // mid = low of upper half, high= high of lower half. mergeSort(data, temp, low, mid, comp); mergeSort(data, temp, mid, high, comp); merge(data, temp, low, mid, high, comp); for (int i=low; i<high; i++) data[i]=temp[i]; } Sort each half merge into temp copy back

8 COMP103 16:8 Merge /** Merge from[low..mid-1] with from[mid..high-1] into to[low..high-1.*/ private static void merge(E[] from, E[] to, int low, int mid, int high, Comparator comp){ int indxLeft = low; // index into the lower half of the "from" range int indxRight = mid; // index into the upper half of the "from" range int indexTo = low; // where we will put the item into "to“ while ( indxLeft<mid && indxRight < high ){ if ( comp.compare(from[indxLeft], from[indxRight]) <=0 ) to[indexTo++] = from[indxLeft++]; else to[indexTo++] = from[indxRight++]; } //copy over the remainder. Note only one loop will do anything. while (indxLeft<mid) to[indexTo++] = from[indxLeft++]; while (indxRight<high) to[indexTo++] = from[indxRight++]; }

9 COMP103 16:9 MergeSort

10 COMP103 16:10 MergeSort Why copy items over twice? private static void mergeSort(E[] data, E[] temp, int low, int high, Comparator comp){ // sort items from low..high-1 using temp array if (high > low+1){ int mid = (low+high)/2; // mid = low of upper 1/2, = high of lower half. mergeSort(temp, data, low, mid, comp); mergeSort(temp, data, mid, high, comp); merge(temp, data, low, mid, high, comp); } Note how we swap temp and data each recursive call Sort each half in temp (using data as extra space) merge halves from temp back into data

11 COMP103 16:11 data [p a1 r f e q2 w q1 t z2 x c v b z1 a2 ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 msort(0..16) [p a1 r f e q2 w q1 t z2 x c v b z1 a2 ] msort(0..8) [p a1 r f e q2 w q1 ] msort(0..4 ) [p a1 r f ] msort(0..2 ) [p a1 ] msort(0..1 ) [p ] msort(1..2) [ a1 ] merge(0.1.2) [a1 p ] msort(2..4) [ r f ] msort(2..3) [ r ] msort(3..4) [ f ] merge(2.3.4) [ f r ] merge(0.2.4) [a1 f p r ] msort(4..8) [ e q2 w q1 ] msort(4..6) [ e q2 ] : : merge(4.5.6) [ e q2 ] msort(6..8) [ w q1 ] : : merge(6.7.8) [ q1 w ] merge(4.6.8) [ e q2 q1 w ] merge(0.4.8) [a1 e f p q2 q1 r w ] msort(8..16) [ t z2 x c v b z1 a2 ] : : merge(8.12.16) [ a2 b c t v x z1 z2 ] merge(0.8.16) [a1 a2 b c e f p q2 q1 r t v w x z1 z2 ]

12 COMP103 16:12 MergeSort Cost

13 COMP103 16:13 MergeSort Cost Level 1:2 * n/2= n Level 2:4 * n/4= n Level 3:8 * n/8= n Level 4:16 * n/16= n Level k: n * 1 = n How many levels? Total cost? = O( ) n = 1,000: n = 1,000,000 n = 1,000,000,000

14 COMP103 16:14 Analysing with Recurrence Relations private static void mergeSort(E[] data, E[] temp, int low, int high, Comparator comp){ if (high > low+1){ int mid = (low+high)/2; mergeSort(temp, data, low, mid, comp); mergeSort(temp, data, mid, high, comp); merge(temp, data, low, mid, high, comp); } } Assume cost of mergeSort on n items is C(n) C(n) = C(n/2) + C(n/2) + n = 2 C(n/2) + n Recurrence Relation: Solve by repeated substitution & find pattern Solve by general method (MATH 261)

15 COMP103 16:15 Solving Recurrence Relations C(n) = 2 C(n/2) + n = 2 [ 2 C(n/4) + n/2] + n = 4 C(n/4) + 2 * n = 4 [ 2 (C(n/8) + n/4] + 2 * n = 8 C(n/8) + 3 * n = 16 C(n/16) + 4 * n : = 2 k C( n/2 k ) + k * n when n = 2 k, k = lg(n) = n C (1) + lg(n) * n since C(1) = 0 C(n) = lg(n) * n

16 COMP103 16:16 Other Properties? Stable: Doesn’t jump any item over an unsorted region ⇒ two equal items preserve their order Same cost on all input No bad worst cases “natural merge” variant doesn’t sort already sorted regions ⇒ will be very fast – O(n) – on almost sorted lists Not in place Needs double the space for temporary work There is an iterative version do all size 1's, then size 2's, then size 4's, etc. Can be done with huge files on disk

17 COMP103 16:17 QuickSort Divide and Conquer, but does its work in the “split” step It splits the array into two (possibly unequal) parts: choose a “pivot” item make sure all items < pivot are in the left part all items > pivot are in the right part Then (recursively) sorts each part public static void quickSort(E[] data, int size, Comparator comp){ quickSort(data, 0, size, comp); }

18 COMP103 16:18 QuickSort public static void quickSort(E[] data, int low, int high, Comparator comp){ if (high-low < 2) // only one item to sort. return; else { // split into two parts, mid = index of boundary int mid = partition(data, low, high, comp); quickSort(data, low, mid, comp); quickSort(data, mid, high, comp); } SEXBQR 67891011 FAPLJM 012345

19 COMP103 16:19 QuickSort: Partition /** Partition into small items (low..mid-1) and large items (mid..high-1) private static int partition(E[] data, int low, int high, Comparator comp){ E pivot = data[(low+high-1)/2];// simple but may be poor choice! int left = low-1; int right = high; while( left <= right ){ do { left++; // skip over items on the left < pivot } while (left<high && comp.compare(data[left], pivot)< 0); do { right--; // skip over items on the right > pivot } while (right>=low && comp.compare(data[right], pivot)> 0); if (left< right) swap(data, left, right); } return left; } or = median(data[low], data[high-1], data[(low+high-1)/2], comp); Getting this code exactly right is very tricky! Many published versions were wrong! SEXBQR 67891011 FAPLJM 012345

20 COMP103 16:20 QuickSort Cost: If Quicksort divides the array exactly in half: (best case) C(n) = n + 2 C(n/2) = n lg(n) comparisons = O(n log(n)) (best case) If Quicksort always divides the array into 1 and n-1: (worst case) C(n) = n + (n-1) + (n-2) + (n-3) + … + 2 + 1 = n(n-1)/2 comparisons = O(n 2 ) (worst case) Average case? Very hard to analyse. Still O(n log(n)), and very good.

21 COMP103 16:21 Other Properties? Unstable: Partition “jumps” items to the other end ⇒ two equal items likely to reverse their order Cost depends on choice of pivot. Choosing first item ⇒ very slow – O(n 2 ) – on almost sorted lists Better choice (median of three) ⇒ O(n log(n)) on almost sorted lists Can spend more time choosing pivot. In place – doesn't use any extra space.

22 COMP103 16:22 QuickSort data array : [p a1 r f e q1 w q2 t z1 x c v b z2 a2 ] indexes : [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] do 0..16 : [ p a1 r f e q1 w q2 t z1 x c v b z2 a2 ] part@q2->8 : [ p a1 a2 f e b c q2 t z1 x w v q1 z2 r ] do 0..8 : [ p a1 a2 f e b c q2 ] part@f->5 : [ c a1 a2 b e f p q2 ] do 0..5 : [ c a1 a2 b e ] part@a2->2 : [a2 a1 c b e ] do 0..2 : [a2 a1 ] part@a2->1 : [a1 a2 ] do 0..1 : [a1 ] do 1..2 : [ a2 ] done 0..2 : [a1 a2 ] do 2..5 : [ c b e ] part@b->3 : [ b c e ] do 2..3 : [ b ] do 3..5 : [ c e ] part@c->4 : [ c e ] do 3..4 : [ c ] do 4..5 : [ e ] done 3..5 : [ c e ] done 2..5 : [ b c e ] done 0..5 : [a1 a2 b c e ] do 5..8 : [ f p q2 ] part@p->7 : [ f p q2 ] do 5..7 : [ f p ] part@f->6 : [ f p ] do 5..6 : [ f ] do 6..7 : [ p ] done 5..7 : [ f p ] do 7..8 : [ q2 ] done 5..8 : [ f p q2 ] done 0..8 : [a1 a2 b c e f p q2 ] do 8..16 : [ t z1 x w v q1 z2 r ] part@w->12 : [ t r q1 v w x z2 z1 ] do 8..12 : [ t r q1 v ] part@r->10 : [ q1 r t v ] do 8..10 : [ q1 r ] part@q1->9 : [ q1 r ] do 8..9 : [ q1 ] do 9..10 : [ r ] done 8..10 : [ q1 r ] do 10..12: [ t v ] part@t->11 : [ t v ] do 10..11: [ t ] do 11..12: [ v ] done 10..12: [ t v ] done 8..12 : [ q1 r t v ] do 12..16: [ w x z2 z1 ] part@x->14 : [ w x z2 z1 ] do 12..14: [ w x ] part@w->13 : [ w x ] do 12..13: [ w ] do 13..14: [ x ] done 12..14: [ w x ] do 14..16: [ z2 z1 ] part@z2->15: [ z1 z2 ] do 14..15: [ z1 ] do 15..16: [ z2 ] done 14..16: [ z1 z2 ] done 12..16: [ w x z1 z2 ] done 8..16 : [ q1 r t v w x z1 z2 ] done 0..16 : [a1 a2 b c e f p q2 q1 r t v w x z1 z2 ] sorted : [a1 a2 b c e f p q2 q1 r t v w x z1 z2 ]

23 COMP103 16:23 QuickSort data array : [p a1 r f e q1 w q2 t z1 x c v b z2 a2 ] indexes : [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] do 0..16 : [ p a1 r f e q1 w q2 t z1 x c v b z2 a2 ] part@q2->8 : [ p a1 a2 f e b c q2 t z1 x w v q1 z2 r ] do 0..8 : [ p a1 a2 f e b c q2 ] part@f->5 : [ c a1 a2 b e f p q2 ] do 0..5 : [ c a1 a2 b e ] part@a2->2 : [a2 a1 c b e ] do 0..2 : [a2 a1 ] part@a2->1 : [a1 a2 ] do 0..1 : [a1 ] do 1..2 : [ a2 ] done 0..2 : [a1 a2 ] do 2..5 : [ c b e ] part@b->3 : [ b c e ] do 2..3 : [ b ] do 3..5 : [ c e ] part@c->4 : [ c e ] do 3..4 : [ c ] do 4..5 : [ e ] done 3..5 : [ c e ]

24 COMP103 16:24 QuickSort done 2..5 : [ b c e ] done 0..5 : [a1 a2 b c e ] do 5..8 : [ f p q2 ] part@p->7 : [ f p q2 ] do 5..7 : [ f p ] part@f->6 : [ f p ] do 5..6 : [ f ] do 6..7 : [ p ] done 5..7 : [ f p ] do 7..8 : [ q2 ] done 5..8 : [ f p q2 ] done 0..8 : [a1 a2 b c e f p q2 ] do 8..16 : [ t z1 x w v q1 z2 r ] part@w->12 : [ t r q1 v w x z2 z1 ] do 8..12 : [ t r q1 v ] part@r->10 : [ q1 r t v ] do 8..10 : [ q1 r ] part@q1->9 : [ q1 r ] do 8..9 : [ q1 ] do 9..10 : [ r ] done 8..10 : [ q1 r ]

25 COMP103 16:25 QuickSort do 10..12: [ t v ] part@t->11 : [ t v ] do 10..11: [ t ] do 11..12: [ v ] done 10..12: [ t v ] done 8..12 : [ q1 r t v ] do 12..16: [ w x z2 z1 ] part@x->14 : [ w x z2 z1 ] do 12..14: [ w x ] part@w->13 : [ w x ] do 12..13: [ w ] do 13..14: [ x ] done 12..14: [ w x ] do 14..16: [ z2 z1 ] part@z2->15: [ z1 z2 ] do 14..15: [ z1 ] do 15..16: [ z2 ] done 14..16: [ z1 z2 ] done 12..16: [ w x z1 z2 ] done 8..16 : [ q1 r t v w x z1 z2 ] done 0..16 : [a1 a2 b c e f p q2 q1 r t v w x z1 z2 ] sorted : [a1 a2 b c e f p q2 q1 r t v w x z1 z2 ]

26 COMP103 16:26 Where have we been? Implementing Collections: ArrayList: O(n) to add/remove, except at end Queue, Stack:O(1) ArraySet:O(n) (cost of searching) SortedArraySetO(log(n)) to search (with binary search) O(n) to add/remove (cost of inserting) O(n 2 ) to add n items O(n log(n)) to initialise with n items. (with fast sorting) Where next? We’re tired of arrays; lets look at dynamic data structures.


Download ppt "Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington Fast Sorting COMP 103 2012."

Similar presentations


Ads by Google