Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting.

Similar presentations


Presentation on theme: "1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting."— Presentation transcript:

1 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

2 2/28 Advanced Sorting  Two sorts we will cover first.  Shell Sort – an O(n(log 2 n) 2 ) sort … in general, and ‘can approach’ O(n) performance!  Partitioning, an O(nlog 2 n) sort.  Then, we’ll cover the QuickSort.

3 3/28 Recall how the Insertion Sort worked.  Took an element out of the ‘array’ and assumed all elements ‘to the left’ were sorted.  We marked this spot.  And we extracted out that element.  We then  compared the element extracted out with the elements ‘to the left’ of this element and  ‘inserted’ this element into its proper place  shifting all elements to the right as needed to make room for this inserted element and fill the vacated spot.

4 4/28 Approach that helped us:  Constraints:  Helped ourselves by: starting with a single element to the left – so knew ‘that’ element was sorted - certainly sorted unto itself.  Then we proceeded: Slowly the elements to the ‘left’ of the marked element grew in sorted number, as new numbers find their proper place in the subarray to the left - while the unsorted elements to the right diminish in number.

5 5/28 Potential Problems with the Insertion Sort  Now, what happens if the new number to be sorted is very small (or very large) and our sort is ‘ascending (or descending)?’  This may require a large number of ‘copies’ to the right to make room for this new element.  Can require a number of ‘copies’ close to ‘n’ in fact.  Average number of copies is clearly n/2.  For n elements to be sorted and an average of n/2 copies per element, we have n*n/2 or n 2 /2 copies.  That may result in a very inefficient sort.  This is how the insertion sort is an O(n 2 ) sort.  It is this number of copies (comparing and shifting) that decreases its performance.

6 6/28 Shell Sort Approach  Want to reduce these numbers of large shifts  Shell sort does this by sorting a very small subset of numbers – like three or four:  Where the numbers themselves might be large distances apart (like in a large array)  and it sorts them with respect to each other  By sorting a small number of numbers, very small (or very large) numbers can be put much more nearly ‘in place’ much more quickly than with other approaches.  How done?

7 7/28 Shell Sort uses the notion of a ‘computed Gap’  The Shell Sort uses a computed ‘gap’ between numbers represented by an ‘h’ as the distance between numbers in each subset to be sorted.  1. Sorts all numbers (say in the array of numbers) with the same ‘h’ (gap) Like, numbers eight apart – or four apart… Sorts these numbers with respect to each other. 2. Then, after doing this, the algorithm reduces the gap (or distance) to a smaller number, like maybe 4 apart.  3. (Ultimately the gap has size = 1;) Then the algorithm ‘1-sorts’ the array using the insertion sort.

8 8/28 Example  Consider: sort three elements at a time with respect to each other, where the numbers are some ‘h’ distance apart  …………………………………………………….  For array size n=10, and if gap size h = 4, we have four sub-arrays: (We call this a 4-sort)  Indices: (0,4,8), (1,5,9), (2,6) and (3,7). These sets are sorted with respect to each other. (Note: all ten are sorted!)  Arrays are interleaved, but, again, sorted with respect to each other.  (Note: the integers are not yet in final spot.

9 9/28 Consider Improved Performance!  Recall again the Insertion Sort  Recalling how the insertion sort works,  very efficient for arrays nearly sorted (fewer swaps and movement, and yet can be  very inefficient (due to shifts and copies) if the data are very unsorted. Particularly true for very large / very small numbers.  Shell sort does ‘n-sorting’  Capitalizes on initial position of elements especially if they are far from where they might ultimately end up.  Brings numbers more quickly to final position…(or nearer)  Algorithm moves elements that may be very far apart much closer to their final position more quickly thus reducing copying and shifting and swapping!  Shell Sort can approach O(n) performance: much better than O(n 2 ) !

10 10/28 What about Larger Arrays? Gap Size?  Using a carefully researched algorithm to compute optimum gap size,.  Don Knuth developed a ‘recursive’ relationship:  h= 3*h+1 to start with, and then, subsequent gaps at  (h-1)/3.  (note the ‘recursion’ in the formula itself. Uses value of h to compute new value of h.  These h-values are referred to as  interval sequence or gap sequence  and are recursively computed as functions of h.  In more detail:

11 11/28 Don Knuth’s algorithm will start with a 3-sort; that is, sort three numbers some distance apart. By Don Knuth’s research reveals, as it turns out (algorithm is a few slides ahead), for an array of size > 364 and < 1093, 3-sort with a gap size of 364; After that sort, use a gap size of 121; then gap size = 40; steadily decreasing… Develop initial gap size recursively by computing h: (algorithm is three slides ahead) h 3*h+1 h is determined by computing the largest value of h 1 4 computing h=h*3 +1 until h <= nElems/3 is false 4 13 13 40 So, computing h we see that h increases from 1 to 4 to 13 to 121 to 364 to …. 40 121 121 364 Once original gap is determined, sort continues and algorithm steadily reduces gap h from 364 to 121.. 364 1093 until h = 1 1093 3280 So for array size > 364 and < 1093, gap = 364, etc. Gap sizes

12 12/28 Algorithm (covered in previous slide)  Algorithm first uses a short loop to generate the first (initial) value of h.  Then, once we have an initial value of h:  additional values of h are recursively computed depending on the size of the array to be sorted.  Gap then starts with largest h-value.  For a 1000-element array, our initial gap size is 364.  After sorting, we would successively decrease the gap using the formula: h = (h-1)/3 as shown.

13 13/28 Note: 1.As it turns out, the algorithm actually sorts the first two elements of each group for a given gap first; then it goes back and sorts all three-element groups. This results in better performance time.  You will see this if you look carefully at the algorithm.

14 14/28 public void shellSort() { int inner, outer; long temp; int h = 1; // find initial value of h while (h <= nElems/3) // COMPUTE GAP SIZE h = h*3 + 1; // (1, 4, 13, 40, 121, 364,...) // Compute initial value of h // Value of h depends on original size of array, nElems. // start with largest gap (h-value) such that h < nElem/3 while (h > 0) // for 1000 element array, h = 364 { for (outer=h; outer<nElems; outer++) // h – sort the structure… {// for 1000 elements, h = 364; outer < nElems (1000); increment by one. temp = theArray[outer]; inner = outer; while (inner > h-1 && theArray[inner-h] >= temp) { theArray[inner] = theArray[inner-h]; inner -= h; } // end while theArray[inner] = temp; } // end for h = (h-1) / 3; // computes new gap: decreases h } // end while (h>0) } // end shellSort()

15 15/28 Google: Shell Sort Applet  Google: applet Lafore  You will get a number of applet choices.  Select and enjoy

16 16/28 Demo of Shell Sort  Do n=12 and notice how the gap varies across the bars.  You can see when h goes from 4 to 1.  Can see when it compares two in the interval … then three; then 1-sorts.  Do 100 sort.  It starts with h = 40. See it compares two of the three in the interval until there are only intervals of two left.  There is a larger number of intervals when it goes to h= 13.  Go to h=4 and see more intervals yet.  Finally, h=1.  Do this.

17 17/28 Shell Sort - Evaluation  Good for medium-sized array up to a few thousand items.  Shell Sort - O(n(log 2 n) 2 ) is not as fast as the Quick Sort O(nlog 2 n) (coming soon)  Not so good for large files, but  Easy to implement  Requires very little extra space.  All sorts have a ‘worst case’ performance.  For Shell Sorts, the  Worse case is not much worse than average performance, so this is good!  (Worse case is very different than average case in a Quick Sort).

18 18/28 Final Remarks on Shell Sort  Other sequences are available.  Many alternatives available. Can experiment…  Ultimately, need to end up with a 1  Forces last pass to be an insertion sort.  Guideline:  Gaps should be relatively prime.  Note Shell Sort’s numbers presented are not all prime (4, 40…). This led to some earlier inefficiencies.  Experiments on Shell Sort yield performance mostly between O(n 3/2 ) to O(n 7/6) )  or from almost O(n 2 ) down to almost O(n)!  Quite a difference and the difference is realized as n increases, which makes sense.

19 19/28 Partitioning

20 20/28 Partitioning  Partitioning is key to QuickSort thinking.  Partitioning divides data into two groups dependent upon the value of a key.  E.g. Divide students into two groups: 3.0 (Incidentally, why is a gpa of 3.0 important??)  We select a Pivot Value:  value used to separate data items into two groups:  end up with Data pivot value.

21 21/28 Pivot Values  Note: pivot point can be any key value.  Need not be a midpoint or value ‘half-way.’  Would be nice if pivot were half-way point, but we have no way of knowing…  Later we will see how the choice of the pivot impacts performance!  Pivot value used to separate array into left side and right side.  Ideally, we’d ‘like’ the sub-arrays to be roughly the same size, and we will work toward that reality.

22 22/28 Run Partition Algorithm to build Sub-Arrays  Once pivot value selected, we run the partition algorithm  Once run,  data on the left side of the pivot ‘belongs’ to the left side of the array (whatever number of elements may be on the left) and,  Data on the right side (>=) than the pivot value belong to the right side, however many elements are on the right side.  Note: Once partitioning is run, data is NOT sorted,  But, the items are a lot ‘closer’ to their final position…  And array is partitioned based on the pivot value.

23 23/28 The Partitioning Algorithm  Pick a pivot value… (more later)  Start with index at the left side of one partition.  Let’s call it left scan.  Move toward the right.  Compare element to pivot value.  If an element is less than the pivot value, leave it alone. Move to the right.  Advance to the right until element is >= pivot value and then Stop.  Starting with index at right most index on the right side  Let’s call it a right scan.  Move toward the left.  Compare element to the pivot value  If an element is >= pivot value, leave it alone; Move to the left.  Advance to the left until element is < pivot value and then Stop.  Swap the two values.  Iterate (back on the left; then right) until left and right scan are looking at the same entry.  ….

24 24/28  Let’s look at the applet

25 25/28 Partition.html  Google: applet Lafore  Run with n=12 with various orderings…  Run with n=40. Notice the partition first and the final ordering…  Note: in running the partitioning algorithm the data are not totally sorted – but they are a good bit closer.

26 26/28 Partitioning and the Pivot Value  Note partitioning is not stable.  As elements on one side are moved to the other side of the pivot value, they are NOT necessarily in the same relative positions in this ‘new’ partition!  In fact, they tend to be in reverse order.  Further, the number of elements on each side need not be the same either – depends on the pivot value.  Very likely, there is NOT the same number of elements on each side of the pivot.

27 27/28 One (of several ) Problems with Partitioning  1. What if a poor pivot value were chosen such that all elements to the left were < pivot value?   Algorithm index keeps advancing.  End up with array index out of bounds exception.  Ditto the other way. See code below. while (leftPtr < right && theArray[++leftPtr] < pivot) ; // nop  Clearly – as any program that is to be robust, there must be checks on the pivot value.

28 28/28 Efficiency of the Partition  Algorithm is pretty efficient too  Runs in O(n) time.  Pointers move from opposite ends moving and swapping at a constant rate.  If n were 2n, the algorithm would take roughly twice as long.  Thus the algorithm operates in O(n) time – means time is proportional to the number of items being sorted.

29 29/28 Efficiency of the Partitioning Algorithm  Non random data yields terrible results!  If data is inversely ordered, then every pair will be swapped, so n/2 swaps! Very inefficient!  Multiply this by n elements and we have a n 2 /2. Poor!  Random data: yields fewer than n/2 swaps.  Some will already be in the right place.  On average for random data, about half of maximum no. of swaps will take place.  Regardless of random / non-random, both situations result in an efficiency proportional to n.


Download ppt "1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting."

Similar presentations


Ads by Google