1 Arrays 2: Sorting and Searching
2 0. Admin. §1) No class Thursday. §2) Will cover Strings next Tuesday. §3) Take in report. §4) Hand out program assignment 6, due 11/20, based on material covered today.
3 1. Review of array basics. §What is an array? §A contiguous series of homogeneous elements. §What does contiguous mean? § Pros and cons? §What does homogeneous mean? § Pros and cons?
4 The bounds of an array. §C# defaults to 0-based arrays. If we code float [] Amounts = new float [100]; §we get 100 consecutive floats from Amounts[0] to Amounts[99]. It is important to remember this so we avoid a “bounds error”. §Suppose we code Amounts[I] = 83000; §What if I = -2 or 100?
5 Review of the list concept. §The list is the occupied portion of the array. §The array is the physical container. It may be empty, full, or partially full. §E.g. Amounts, which can contain 100 elements, may only be occupied from Amounts[0] to Amounts[56]. §The list consists of only the valid data values. This does not include “garbage” that happens to be in memory beyond the list.
6 Why does the list / array distinction matter? §We should only process the valid data (list). §Otherwise we may e.g. include garbage values in the Total of all entries, update both valid and garbage data, sort garbage into valid data and search in the garbage! §This is computer science, not the FBI!
7 List processing. §While it is proper to talk of loading an array, thereafter it is helpful to speak of processing the list: §We should ensure that our algorithm only processes the valid data values, e.g. from Amounts[0] to Amounts[56].
8 2.Sorting lists. §To sort information means to arrange it in some order. §It may be a single item, arranged in ascending or descending order e.g. opening weekend profits for movies from best to worst. §Usually it is a collection of data (a record) sorted in order of a key field.
9 Why sort lists? §1) Shows highest / lowest values at top / bottom. §2) May discover important groupings e.g. a lot of people called Patel, Smith or Slarteebartfast. §3) Easier to search. §Why is a dictionary easier to search than a random listing of words and their definitions?
10 The Bubble Sort. §Idea: §Repeatedly pass through a list, comparing adjacent elements. §If the elements are out of order, swap. §Large values “bubble”, one swap at a time, to the end of the list.
11 Example. §Pass 1: §EBBB §BEEE §FCCC §CFDD §DDFA §AAAF §Pass 2: §BBB §CCC §EDD §DEA §AAE §FFF
12 Example. §Pass 3:Pass 4: B C A A C D E F §Pass 5: § A § B § C § D § E § F
13 Why is it called the “bubble” sort? §After the first pass, it is guaranteed that the largest element (F) is sorted. §After the second pass, it is guaranteed that the second largest element (E) is sorted. §So, if we aren’t lucky, the maximum number of passes for N elements is….
14 Why is it called the “bubble” sort? §N-1. §Why? §If all the elements larger than the smallest are correctly placed, the smallest element must already be correctly placed. §What about the maximum number of comparisons within one pass, comparing each adjacent pair?
15 Why is it called the “bubble” sort? §Also N-1. Why? §So in worse case, if there are N elements, takes N-1 * N-1 or O (N 2 ) comparisons and swaps to sort N elements. §However, the algorithm can be optimized. §(1) If the list is sorted in fewer than N-1 passes, how could we know?
16 Optimization. §Use a boolean Swap flag: set to false before each pass; only set to true if a swap occurs. §So, if it stays false, there were no swaps, so the list must be sorted already. §(2) Do we always need to compare every adjacent pair? §No, because some parts are already sorted.
17 Optimization. §Can continue to do comparisons up until the last element that was swapped, e.g. § BB § CC § AA § E D } Last § D E } Change § F F
18 See BUBSORT.CS §This code shows an optimized Bubble Sort (which is a bit like saying a Turbo tortoise). §Note how arrays are passed as parameters. §Note how the Load function keeps track of the size of the list. §Look at the Bubble Sort logic.
19 Logic of Bubble Sort. §Note the use of a nested loop. §The outer loop controls the passes, running until the first pass with no swaps. §The inner loop controls the comparisons within each pass, running until the last change made on the previous pass. §Note the Swap logic. Why are Temporary variables needed?
20 Swapping. §Suppose we want to swap E and B. §We can’t say move E to B, then B to E, because the first move destroys B. §So? §1) Move E to Temp. §2) Move B to E. §3) Move Temp to B.
21 Limitation of the Bubble Sort. §The Bubble Sort is slow. §But why? §Can only swap adjacent elements. §Consider the worst case scenario: §Index Value §[0] 999 §.. §[999] 0
22 What is the problem? §It will take 999 comparisons and swaps to place the Value 999 at the correct location [999]. Shuffling is very slow. §What would help solve this problem?
23 The Shell Sort. §Donald Shell suggested that we compare elements separated by a larger gap. §Start with a gap half the size of the list. Keep passing through until there is a pass with no swaps, then halve the gap. §Why does this help? §See diagram and Shell sort code.
24 Efficiency. §Studies find no significant difference between the Shell Sort and Bubble Sort for small lists e.g. <= 10 elements. §But for 100 / 1000 / 10000, increasing advantage to use Shell Sort. §The Shell Sort is estimated to be an §O (N 1.25 ) algorithm.
25 Searching Lists. §If a list is unsorted, we are forced to do a serial or linear search: §I = 0; §Found = false; §while (!Found && I < ListSize) { § if (SearchElement != List[I]) § I++; § else § Found = true; §} // end while
26 But this can be tedious… §A serial search is O (N), like loading an array. §Only Forrest Gump would search a dictionary this way! §Why is this silly? §How would we search a dictionary?
27 The Binary Search (for sorted lists). §Use 3 indexes Start, Stop and Middle. §Compare SearchElement to Element[Middle] §if (SearchElement < List[Middle]) § Stop = Middle -1; §else §if (SearchElement > List[Middle] ) § Start = Middle + 1 §else § Found = true;
28 More technically: §Start = 0; §Stop = ListLength-1; §Found = false; §while ((Start <= Stop) && (!Found)) { § Middle = (Start + Stop) / 2; § if (SearchElement < List[Middle]) § Stop = Middle -1; § else if (SearchElement > List[Middle]) § Start = Middle + 1; § else { § Found = true; § Location = Middle } §} // end while
29 Example: §A B C D E F G §[0] [1] [2] [3] [4] [5] [6] §Start Middle Stop §E.g. Find ‘B’ §E.g. Find ‘E’.