1 CSC 222: Object-Oriented Programming Spring 2013 Searching and sorting  sequential search vs. binary search  algorithm analysis: big-Oh, rate-of-growth.

1 CSC 222: Object-Oriented Programming Spring 2013 Searching and sorting  sequential search vs. binary search  algorithm analysis: big-Oh, rate-of-growth  O(N 2 ) sorts: insertion sort, selection sort  O(N log N) sorts: merge sort, quick sort  recursion base case, recursive case recursion vs. iteration

2 Searching a list suppose you have a list, and want to find a particular item, e.g.,  lookup a word in a dictionary  find a number in the phone book  locate a student's exam from a pile searching is a common task in computing  searching a database  checking a login password  lookup the value assigned to a variable in memory if the items in the list are unordered (e.g., added at random)  desired item is equally likely to be at any point in the list  need to systematically search through the list, check each entry until found  sequential search

3 Sequential search sequential search traverses the list from beginning to end  check each entry in the list  if matches the desired entry, then FOUND (return its index)  if traverse entire list and no match, then NOT FOUND (return -1) recall: the ArrayList class has an indexOf method /** * Performs sequential search on the array field named items * @param desired item to be searched for * @returns index where desired first occurs, -1 if not found */ public int indexOf(T desired) { for(int k=0; k < this.items.length; k++) { if (desired.equals(this.items[k])) { return k; } return -1; }

4 How efficient is sequential search? in the worst case:  the item you are looking for is in the last position of the list (or not found)  requires traversing and checking every item in the list  if 100 or 1,000 entries  NO BIG DEAL  if 10,000 or 100,000 entries  NOTICEABLE for this algorithm, the dominant factor in execution time is checking an item  the number of checks will determine efficiency in the average case? in the best case?

5 Big-Oh notation to represent an algorithm’s performance in relation to the size of the problem, computer scientists use Big-Oh notation an algorithm is O(N) if the number of operations required to solve a problem is proportional to the size of the problem sequential search on a list of N items requires roughly N checks (+ other constants)  O(N) for an O(N) algorithm, doubling the size of the problem requires double the amount of work (in the worst case)  if it takes 1 second to search a list of 1,000 items, then it takes 2 seconds to search a list of 2,000 items it takes 4 seconds to search a list of 4,000 items it takes 8 seconds to search a list of 8,000 items...

6 Searching an ordered list when the list is unordered, can't do any better than sequential search  but, if the list is ordered, a better alternative exists e.g., when looking up a word in the dictionary or name in the phone book  can take ordering knowledge into account  pick a spot – if too far in the list, then go backward; if not far enough, go forward binary search algorithm  check midpoint of the list  if desired item is found there, then DONE  if the item at midpoint comes after the desired item in the ordering scheme, then repeat the process on the left half  if the item at midpoint comes before the desired item in the ordering scheme, then repeat the process on the right half

7 Binary search /** * Performs binary search on a sorted list. * @param items sorted list of Comparable items * @param desired item to be searched for * @returns index where desired first occurs, -(insertion point)-1 if not found */ public static > int binarySearch(List items, Comparable desired) { int left = 0;// initialize range where desired could be int right = items.length-1; while (left <= right) { int mid = (left+right)/2;// get midpoint value and compare int comparison = desired.compareTo(items[mid]); if (comparison == 0) {// if desired at midpoint, then DONE return mid; } else if (comparison < 0) {// if less than midpoint, focus on left half right = mid-1; } else {// otherwise, focus on right half left = mid + 1; } return /* CLASS EXERCISE */ ;// if reduce to empty range, NOT FOUND } the Collections utility class contains a binarySearch method  takes a List of Comparable items and the desired item List is an interface that specifies basic list operations ( ArrayList implements) Comparable is an interface that requires compareTo method ( String implements) MORE ON INTERFACES LATER this ugly code simply ensures that the list contains Comparable objects

8 Visualizing binary search note: each check reduces the range in which the item can be found by half  see http://balance3e.com/Ch8/search.html for demohttp://balance3e.com/Ch8/search.html

9 How efficient is binary search? in the worst case:  the item you are looking for is in the first or last position of the list (or not found) start with N items in list after 1 st check, reduced to N/2 items to search after 2 nd check, reduced to N/4 items to search after 3 rd check, reduced to N/8 items to search... after log 2 N checks, reduced to 1 item to search again, the dominant factor in execution time is checking an item  the number of checks will determine efficiency in the average case? in the best case?

10 Big-Oh notation an algorithm is O(log N) if the number of operations required to solve a problem is proportional to the logarithm of the size of the problem binary search on a list of N items requires roughly log 2 N checks (+ other constants)  O(log N) for an O(log N) algorithm, doubling the size of the problem adds only a constant amount of work  if it takes 1 second to search a list of 1,000 items, then searching a list of 2,000 items will take time to check midpoint + 1 second searching a list of 4,000 items will take time for 2 checks + 1 second searching a list of 8,000 items will take time for 3 checks + 1 second...

11 Comparison: searching a phone book Number of entries in phone book Number of checks performed by sequential search Number of checks performed by binary search 100 7 200 8 400 9 800 10 1,600 11 ……… 10,000 14 20,000 15 40,000 16 ……… 1,000,000 20 to search a phone book of the United States (~310 million) using binary search? to search a phone book of the world (7 billion) using binary search?

12 Dictionary revisited public class Dictionary { private ArrayList words;... public boolean addWord(String newWord) { int index = Collections.binarySearch(this.words, newWord.toLowerCase()) this.words.add(Math.abs(index)-1, newWord.toLowerCase()); return true; } public boolean addWordNoDupes(String newWord) { int index = Collections.binarySearch(this.words, newWord.toLowerCase()); if (index < 0) { this.words.add(Math.abs(index)-1, newWord.toLowerCase()); return true; } return false; } public boolean findWord(String desiredWord) { return (Collections.binarySearch(this.words, desiredWord.toLowerCase()) >= 0); } binary search works as long as the list of words is sorted dictionary.txt is sorted, so can load the dictionary and do searches to ensure correct behavior, must also make sure that add methods maintain sorting

13 In the worst case… suppose words are added in reverse order: "zoo", "moo", "foo", "boo" zoo to add "moo", must first shift "zoo" one spot to the right moozoo to add "foo", must first shift "moo" and "zoo" each one spot to the right foomoozoo to add "boo", must first shift "foo", "moo" and "zoo" each one spot to the right boofoomoozoo

14 Worst case (in general) if inserting N items in reverse order  1 st item inserted directly  2 nd item requires 1 shift, 1 insertion  3 rd item requires 2 shifts, 1 insertion ...  N th item requires N-1 shifts, 1 insertion --------------------------------------------------------- (1 + 2 + 3 + … + N-1) = N(N-1)/2 = (N 2 – N)/2 shifts + N insertions this approach is called "insertion sort"  insertion sort builds a sorted list by repeatedly inserting items in correct order since an insertion sort of N items can take roughly N 2 steps, it is an O(N 2 ) algorithm

15 Timing the worst case # items (N) time in msec 5,00015 10,000 49 20,000 162 40,000 651 80,000 2270 160,000 9168 320,000 36463 System.currentTimeMillis method accesses the system clock and returns the time (in milliseconds) we can use it to time repeated add s to a dictionary public class TimeDictionary { public static int timeAdds(int numValues) { Dictionary dict = new Dictionary(); long startTime = System.currentTimeMillis(); for (int i = numValues; i > 0; i--) { String word = "0000000000" + i; dict.addWord(word.substring(word.length()-10)); } long endTime = System.currentTimeMillis(); return (int)(endTime-startTime); }

16 O(N 2 ) performance as the problem size doubles, the time can quadruple makes sense for an O(N 2 ) algorithm  if X items, then X 2 steps required  if 2X items, then (2X) 2 = 4X 2 steps QUESTION: why is the factor of 4 not realized immediately? Big-Oh captures rate-of-growth behavior in the long run when determining Big-Oh, only the dominant factor is significant (in the long run) cost = N(N-1)/2 shifts (+ N inserts + additional operations)  O(N 2 ) N=1,000: 499,500 shifts + 1,000 inserts + …overhead cost is significant N=100,000: 4,999,950,000 shifts + 100,000 inserts + …only N 2 factor is significant # items (N) time in msec 5,00015 10,000 49 20,000 162 40,000 651 80,000 2270 160,000 9168 320,000 36463

17 Best case for insertion sort while insertion sort can require ~N 2 steps in worst case, it can do much better sometimes  BEST CASE: if items are added in order, then no shifting is required  only requires N insertion steps, so O(N)  if double size, roughly double time on average, might expect to shift only half the time  (1 + 2 + … + N-1)/2 = N(N-1)/4 = (N 2 – N)/4 shifts, so still O(N 2 )  would expect faster timings than worst case, but still quadratic growth list size (N) time in msec 40,000 32 80,000 79 160,000 194 320,000 400

18 Timing insertion sort (average case) import java.util.Random; public class TimeDictionary { public static long timeAdds(int numValues) { Dictionary1 dict = new Dictionary1(); Random randomizer = new Random(); long startTime = System.currentTimeMillis(); for (int i = 0; i < numValues; i++) { String word = "0000000000" + randomizer.nextInt(); dict.addWord(word.substring(word.length()-10)); } long endTime = System.currentTimeMillis(); return (endTime - startTime); } can use a Random object to pick random numbers and add to a String list size (N) time in msec 10,000 87 20,000 119 40,000 397 80,000 1420 160,000 5306 320,000 20442

19 A more generic insertion sort we can code insertion sort independent of the Dictionary class  could use a temporary list for storing the sorted numbers, but not needed  don't stress about >  specifies that the parameter must be an ArrayList of items that either implements or extends a class that implements the Comparable interface (???)  more later, for now, it ensures the class has a compareTo method public static > void insertionSort(ArrayList items) { for (int i = 1; i < items.size(); i++) { // for each index i, T itemToPlace = items.get(i); // save the value at index i int j = i; // starting at index i, while (j > 0 && itemToPlace.compareTo(items.get(j-1)) < 0) { items.set(j, items.get(j-1)); // shift values to the right j--; // until find spot for the value } items.set(j, itemToPlace); // store the value in its spot }

20 Other O(N 2 ) sorts alternative algorithms exist for sorting a list of items e.g., selection sort:  find smallest item, swap into the 1st index  find next smallest item, swap into the 2 nd index  find next smallest item, swap into the 3 rd index ... public static > void selectionSort(ArrayList items) { for (int i = 0; i < items.size()-1; i++) { // for each index i, int indexOfMin = i; // find the ith smallest item for (int j = i+1; j < items.size(); j++) { if (items.get(j).compareTo(items.get(indexOfMin)) < 0) { indexOfMin = j; } T temp = items.get(i); // swap the ith smallest items.set(i, items.get(indexOfMin)); // item into position i items.set(indexOfMin, temp); }

HW5: Hunt the Wumpus you are to implement a text-based adventure game from the 70's 21

Cave class you must implement a class that models a single cave  each cave has a name & number, and is connected to three other caves via tunnels  by default, caves are empty & unvisited (although these can be updated) how do we represent the cave contents?  we could store the contents as a string: "EMPTY", "WUMPUS", "BATS", "PIT" Cave c = new Cave("Cavern of Doom", 0, 1, 2, 3); c.setContents("WUMPUS");  potential problems? 22 there are only 4 possible values for cave contents  the trouble with using a String to represent these is no error checking c.setContents("WUMPAS"); // perfectly legal, but ???

Enumerated types there is a better alternative for when there is a small, fixed number of values  an enumerated type is a new type (class) whose value are explicitly enumerated public enum CaveContents { EMPTY, WUMPUS, PIT, BATS }  note that these values are NOT Strings – they do not have quotes  you specify a enumerated type value by ENUMTYPE.VALUE c.setContents(CaveContents.WUMPUS); since an enumerated type has a fixed number of values, any invalid input would be caught by the compiler 23

CaveMaze the CaveMaze class reads in & stores a maze of caves  since the # of caves is set, simpler to use an array  provided version only allows limited movement  you must add functionality 24 public class CaveMaze { private Cave[] caves; private Cave currentCave; private boolean alive; public CaveMaze(String filename) throws java.io.FileNotFoundException { Scanner infile = new Scanner(new File(filename)); int numCaves = infile.nextInt(); this.caves = new Cave[numCaves]; for (int i = 0; i < numCaves; i++) { int num1 = infile.nextInt(); int num2 = infile.nextInt(); int num3 = infile.nextInt(); int num4 = infile.nextInt(); String name = infile.nextLine().trim(); this.caves[num1] = new Cave(name, num1, num2, num3, num4); } this.alive = true; this.currentCave = this.caves[0]; this.currentCave.markAsVisited(); }... }

25 O(N log N) sorts there are sorting algorithms that do better than insertion & selection sorts merge sort & quick sort are commonly used O(N log N) sorts  recall from sequential vs. binary search examples: when N is large, log N is much smaller than N  thus, when N is large, N log N is much smaller than N 2 N N log N N 2 1,000 10,000 1,000,000 2,000 22,000 4,000,000 4,000 48,000 16,000,000 8,000104,000 64,000,000 16,000224,000 256,000,000 32,000480,0001,024,000,000 they are both recursive algorithms i.e., each breaks the list into pieces, calls itself to sort the smaller pieces, and combines the results

26 Recursion a recursive algorithm is one that refers to itself when solving a problem  to solve a problem, break into smaller instances of problem, solve & combine  recursion can be a powerful design & problem-solving technique examples: binary search, merge sort, hierarchical data structures, … classic (but silly) examples: Fibonacci numbers: 1 st Fibonacci number = 1 2 nd Fibonacci number = 1 Nth Fibonacci number = (N-1)th Fibonacci number + (N-2)th Fibonacci number Euclid's algorithm to find the Greatest Common Divisor (GCD) of a and b (a ≥ b) if a % b == 0, the GCD(a, b) = b otherwise, GCD(a, b) = GCD(b, a % b)

27 Recursive methods these are classic examples, but pretty STUPID  both can be easily implemented using iteration (i.e., loops)  recursive approach to Fibonacci has huge redundancy we will look at better examples later, but first analyze these simple ones /** * Computes Nth Fibonacci number. * @param N sequence index * @returns Nth Fibonacci number */ public int fibonacci(int N) { if (N <= 2) { return 1; } else { return fibonacci(N-1) + fibonacci(N-2); } /** * Computes Greatest Common Denominator. * @param a a positive integer * @param b positive integer (a >= b) * @returns GCD of a and b */ public int GCD(int a, int b) { if (a % b == 0) { return b; } else { return GCD(b, a%b); }

28 Understanding recursion every recursive definition has 2 parts: BASE CASE(S): case(s) so simple that they can be solved directly RECURSIVE CASE(S): more complex – make use of recursion to solve smaller subproblems & combine into a solution to the larger problem int fibonacci(int N) { if (N <= 2) { // BASE CASE return 1; } else {// RECURSIVE CASE return fibonacci(N-1) + fibonacci(N-2); } int GCD(int a, int b) { if (a % b == 0) { // BASE CASE return b; } else { // RECURSIVE return GCD(b, a%b); } to verify that a recursive definition works:  convince yourself that the base case(s) are handled correctly  ASSUME RECURSIVE CALLS WORK ON SMALLER PROBLEMS, then convince yourself that the results from the recursive calls are combined to solve the whole

29 Avoiding infinite(?) recursion to avoid infinite recursion:  must have at least 1 base case (to terminate the recursive sequence)  each recursive call must get closer to a base case int fibonacci(int N) { if (N <= 2) { // BASE CASE return 1; } else {// RECURSIVE CASE return fibonacci(N-1) + fibonacci(N-2); } int GCD(int a, int b) { if (a % b == 0) { // BASE CASE return b; } else { // RECURSIVE return GCD(b, a%b); } with each recursive call, the number is getting smaller  closer to base case (≤ 2) with each recursive call, a & b are getting smaller  closer to base case (a % b == 0)

30 Merge sort a better example of recursion is merge sort BASE CASE: to sort a list of 0 or 1 item, DO NOTHING! RECURSIVE CASE: 1.Divide the list in half 2.Recursively sort each half using merge sort 3.Merge the two sorted halves together 129620315 1296 20315 1. 6912 31520 2. 369121520 3.

31 Merging two sorted lists merging two lists can be done in a single pass  since sorted, need only compare values at front of each, select smallest  requires additional list structure to store merged items public > void merge(ArrayList items, int low, int high) { ArrayList copy = new ArrayList (); int size = high-low+1; int middle = (low+high+1)/2; int front1 = low; int front2 = middle; for (int i = 0; i < size; i++) { if (front2 > high || (front1 < middle && items.get(front1).compareTo(items.get(front2)) < 0)) { copy.add(items.get(front1)); front1++; } else { copy.add(items.get(front2)); front2++; } for (int k = 0; k < size; k++) { items.set(low+k, copy.get(k)); }

32 Merge sort once merge has been written, merge sort is simple  for recursion to work, need to be able to specify range to be sorted  initially, want to sort the entire range of the list (index 0 to list size – 1)  recursive call sorts left half (start to middle) & right half (middle to end) ... private > void mergeSort(ArrayList items, int low, int high) { if (low < high) { int middle = (low + high)/2; mergeSort(items, low, middle); mergeSort(items, middle+1, high); merge(items, low, high); } public > void mergeSort(ArrayList items) { mergeSort(items, 0, items.size()-1); } note: private helper method does the recursion; public method calls the helper with appropriate inputs

33 Recursive analysis of a recursive algorithm cost of sorting N items = cost of sorting left half (N/2 items) + cost of sorting right half (N/2 items) + cost of merging (N items) more succinctly: Cost(N) = 2*Cost(N/2) + C 1 *N Cost(N) = 2*Cost(N/2) + C 1 *Ncan unwind Cost(N/2) = 2*( 2*Cost(N/4) + C 2 *N/2 ) + C 1 *N = 4*Cost(N/4) + (C 1 + C 2 )*Ncan unwind Cost(N/4) = 4*( 2*Cost(N/8) + C 3 *N/4 ) + (C 1 + C 2 )*N = 8*Cost(N/8) + (C 1 + C 2 + C 3 )*Ncan continue unwinding = … = N*Cost(1) + (C 1 + C 2 /2 + C 3 /4 + … + C logN /N)*N = (C 0 + C 1 + C 2 + C 3 + … + C logN )*Nwhere C 0 = Cost(1) ≤ (max(C 0, C 1,…, C logN )*log N) * N = C * N log Nwhere C = max(C 0, C 1,…, C logN )  O(N log N)

34 Dictionary revisited recall most recent version of Dictionary  inserts each new word in order (i.e., insertion sort) & utilizes binary search  searching is fast (binary search), but adding is slow  N adds + N searches: N*O(N) + N*O(log N) = O(N 2 ) + O(N log N) = O(N 2 ) if you are going to do lots of adds in between searches:  simply add each item at the end  O(1)  before the first search, must sort – could use merge sort  N adds + sort + N searches: N*O(1) + O(N log N) + N*log(N) = O(N log N) Collections class contains a sort method that implements quick sort  in practice, quick sort is a faster O(N log N) sort than merge sort 1.picks a pivot element from the list (can do this at random or be smarter) 2.partitions the list so that all items ≤ pivot are to left, al items > pivot are to right 3.recursively (quick) sorts the partitions

35 Modified Dictionary class public class Dictionary2 { private ArrayList words; private boolean isSorted; public Dictionary2() { this.words = new ArrayList (); this.isSorted = true; } public Dictionary2(String fileName) { this(); try { Scanner infile = new Scanner(new File(fileName)); while (infile.hasNext()) { String nextWord = infile.next(); this.addWord(nextWord); } infile.close(); } catch (java.io.FileNotFoundException e) { System.out.println("No such file: " + fileName); } public boolean addWord(String newWord) { this.words.add(newWord.toLowerCase()); this.isSorted = false; return true; } public boolean findWord(String desiredWord) { if (!this.isSorted) { Collections.sort(this.words); this.isSorted = true; } return Collections.binarySearch(this.words, desiredWord.toLowerCase()) >= 0; }... } the isSorted field keeps track of whether the list is sorted (i.e., no addWord s have been performed since last findWord ) we could do a little more work in addWord to avoid unnecessary sorts note: gives better performance if N adds are followed by N searches what if the adds & searches alternate?

36 Dictionary2 timings # items (N)Dictionary1 (msec)Dictionary2 (msec) 100,000 2121 176 200,000 8021 386 400,000 31216 864 import java.util.Random; public class TimeDictionary { public static int timeAdds(int numValues) { Dictionary2 dict = new Dictionary2(); Random randomizer = new Random(); long startTime = System.currentTimeMillis(); for (int i = 0; i < numValues; i++) { String word = "0000000000" + randomizer.nextInt(); dict.addWord(word.substring(word.length()-10)); } for (int i = 0; i < numValues; i++) { dict.findWord("zzz"); } long endTime = System.currentTimeMillis(); return (int)(endTime-startTime); } N adds followed by N searches: Dictionary1 used insertion sort & binary search O(N 2 ) + O(N log N)  O(N 2 ) Dictionary2 uses add-at-end, quick sort before first search, then binary search O(N) + O(N log N) + O(N log N)  O(N log N)

37 Recursion vs. iteration it wouldn't be difficult to code fibonacci and GCD without recursion public int fibonacci(int N) {public int GCD(int a, int b) { int previous = 1; while (a % b != 0) { int current = 1; int temp = b; for (int i = 3; i <= N; i++) { b = a % b; int newCurrent = current + previous; a = temp; previous = current; } current = newCurrent; return b; }} return current; } in theory, any recursive algorithm can be rewritten iteratively (using a loop)  but sometimes, a recursive definition is MUCH clearer & MUCH easier to write e.g., merge sort

38 Recursion & efficiency there is some overhead cost associated with recursion public int fibonacci(int N) {public int GCD(int a, int b) { int previous = 1; while (a % b != 0) { int current = 1; int temp = b; for (int i = 3; i <= N; i++) { b = a % b; int newCurrent = current + previous; a = temp; previous = current; } current = newCurrent; return b; }} return current; }  with recursive version: each refinement requires a method call involves saving current execution state, allocating memory for the method instance, allocating and initializing parameters, returning value, …  with iterative version: each refinement involves a loop iteration + assignments the cost of recursion is relatively small, so usually no noticeable difference  in practical terms, there is a limit to how deep recursion can go e.g., can't calculate the 10 millionth fibonacci number  in the rare case that recursive depth can be large (> 1,000), consider iteration

39 Recursion & redundancy in the case of GCD, there is only a minor efficiency difference  number of recursive calls = number of loop iterations this is not always the case  efficiency can be significantly different (due to different underlying algorithms) consider the recursive fibonacci method: fibonacci(5) fibonacci(4) + fibonacci(3) fibonacci(3) + fibonacci(2)fibonacci(2) + fibonacci(1) fibonacci(2) + fibonacci(1)  there is a SIGNIFICANT amount of redundancy in the recursive version number of recursive calls > number of loop iterations (by an exponential amount!)  recursive version is MUCH slower than the iterative one in fact, it bogs down on relatively small values of N

1 CSC 222: Object-Oriented Programming Spring 2013 Searching and sorting  sequential search vs. binary search  algorithm analysis: big-Oh, rate-of-growth.

Similar presentations

Presentation on theme: "1 CSC 222: Object-Oriented Programming Spring 2013 Searching and sorting  sequential search vs. binary search  algorithm analysis: big-Oh, rate-of-growth."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CSC 222: Object-Oriented Programming Spring 2013 Searching and sorting  sequential search vs. binary search  algorithm analysis: big-Oh, rate-of-growth.

Similar presentations

Presentation on theme: "1 CSC 222: Object-Oriented Programming Spring 2013 Searching and sorting  sequential search vs. binary search  algorithm analysis: big-Oh, rate-of-growth."— Presentation transcript:

Similar presentations

About project

Feedback