Download presentation
Presentation is loading. Please wait.
Published byMavis Bond Modified over 9 years ago
1
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Lecture 9: Searching Data Structures and Algorithms for Information Processing Lecture 9: Searching
2
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 2 Lecture 9: Searching Outline The simplest method: serial search Binary search Open-address hashing Chained hashing
3
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 3 Lecture 9: Searching Search Algorithms Whenever large amounts of data need to be accessed quickly, search algorithms are crucially involved.
4
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 4 Lecture 9: Searching Search Algorithms Lie at the heart of many computer technologies. To name a few: –Databases –Information retrieval applications –Web infrastructure (file systems, domain name servers, etc.) –String searching for patterns
5
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 5 Lecture 9: Searching Search Algorithms: Two Broad Categories Searching a static database –Accessing indexed Web pages –Finding a file on disk Evaluating a dynamically changing set of hypotheses –Computer chess (search for a move) –Speech recognition (search for text given speech) We’ll be concerned with the first
6
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 6 Lecture 9: Searching The Simplest Search: Serial Lookup Items are stored in an array or list. To search for an item x: –Start at the beginning of the list –Compare the current item to x –If unequal, proceed to next item
7
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 7 Lecture 9: Searching Pseudocode for Serial Search // Find x in an array a of length n int i=0; boolean found = false; while ((i < n) && !found) { if (a[i] == x) found = true; else i++; } if (found)...
8
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 8 Lecture 9: Searching Analysis for Serial Search Best case: Requires one array access: Θ(1) Worst case: Requires n array accesses: Θ(n) Average case: To access an item, assuming position is random (uniform):(1+2+3+...+n)/n = n(n+1)/2n = (n+1)/2 = Θ(n)
9
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 9 Lecture 9: Searching A Useful Combinatorial Identity 1+2+3+…+n = n(n+1)/2 Why? Algebraic Proof in Main Visual Counting
10
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 10 Lecture 9: Searching Visual Counting n*n
11
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 11 Lecture 9: Searching Visual Counting n
12
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 12 Lecture 9: Searching Visual Counting n*n - n
13
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 13 Lecture 9: Searching Visual Counting (n*n - n)/2 + n = n(n+1)/2
14
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 14 Lecture 9: Searching Binary Search Can be used whenever the data are totally ordered -- e.g., the integers. All elements are comparable. Requires sorting in advance, and storing in an array One of the simplest to implement, often “fast enough” Can be tricky to handle “boundary cases” This a classic divide-and-conquer algorithm.
15
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 15 Lecture 9: Searching Idea of Binary Search Closely related to the natural algorithm we use to look up a word in a dictionary –Open to the middle –If target comes before all words on the page, search in left half of book –Otherwise, search in right half.
16
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 16 Lecture 9: Searching Interface for Binary Search int search(int [] a, int first, int size, int target) Parameters: –int [] a: array to be searched over –Search over a[first,first+1,...,first+size-1] Precondition: –array is sorted in increasing order –first >= 0
17
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 17 Lecture 9: Searching int search (int [] a, int start, int size, int target) { if (size <= 0) return -1; else { int middle = start + size/2; if (a[middle] == target) return middle; else if (target < a[middle]) return search(a, start, size/2, target); else return search(a, middle+1, size/2, target); } Implementation
18
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 18 Lecture 9: Searching Implementation Where’s the error?? Suppose size is odd. Are new sizes correct? Suppose size is even. Are new sizes correct?
19
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 19 Lecture 9: Searching int search (int [] a, int first, int size, int target) { if (size <= 0) return -1; else { int middle = first + size/2; if (a[middle] == target) return middle; else if (target < a[middle]) return search(a, first, size/2, target); else return search(a, middle+1, (size-1)/2, target); } Implementation
20
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 20 Lecture 9: Searching Boundary Cases Binary search is sometimes tricky to get right. A common source of bugs. Test cases are not always helpful for checking correctness of code. How many test cases would our first implementation solve?
21
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 21 Lecture 9: Searching Binary Search with Other Data Structures Can binary search be implemented using linked lists rather than arrays? Are there any other data structures that could be used?
22
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 22 Lecture 9: Searching Analysis of Binary Search Recursively dividing up array in half represents data as a full binary tree. Consider the simplest case -- array of size n = 2 k -1, complete binary tree. Take away one and divide by 2. New Size = 2 k-1 - 1. We can only do that k times and k = Lg(n+1). Thus, worst case involves Θ(log n) operations.
23
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 23 Lecture 9: Searching Average Case A complete binary tree with k leaves has k-1 internal nodes. So, about half of the n data elements require Θ(log n) operations to find. Thus, assuming uniform distribution on target elements, average cost is also Θ(log n).
24
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 24 Lecture 9: Searching Binary Search is Limited When we have a large number of items that will be accessed in part of the program, where efficiency is crucial, binary search may be too slow.
25
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. Improving Binary Search Try to guess more precisely where the key is located in the interval. Generalize middle = first + size/2 (key – a[first]) middle = ------------------------------ * size (a[first+size-1] – a[first]) 25 Lecture 9: Searching
26
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. Interpolation Search This modifies method is called interpolation search. Uses fewer than log(log(N)) comparisons in the average caes. But uses Θ(N) in the worst case. For analysis, see Perl, Ital, Avni “Interpolation Search – A Log Log N search” CACM 21 (1978) Pages 550 – 553 Is log (log (N)) better that log (N)? 26 Lecture 9: Searching
27
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. Comparing Log N to Log(Log N) Suppose N = 2^100 Log N = 100 Log (Log N) = Log (100) = 6.65 Suppose N = 2^(2^100) Log N = 2^100 Log (Log N) = Log 2^100 = 100 27 Lecture 9: Searching
28
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. Comparing Log(N) to Log(Log N) Or, by taking limits… Lim Log(Log(n)) / Log(n) n->∞ is of the form inf. / inf. Apply L’Hopital and take derivatives. Lim 1/(Log N) * 1/n n->∞ -------------------- = 0 1/n 28 Lecture 9: Searching
29
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 29 Lecture 9: Searching Hashing Fortunately, we can often do better Hashing is a technique that where the access time can be O(1) rather than O(log n)
30
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 30 Lecture 9: Searching Open Address Hashing The basic technique: Items are stored in an array of size N The preferred position in the array is computed using a hash function of the item’s key When adding an item, if the preferred position is occupied, the next open position in the array is used instead.
31
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 31 Lecture 9: Searching Open Address Hashing Main’s presentation for Chapter 11
32
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 32 Lecture 9: Searching A Basic Hash Table We keep arrays for the keys and data, and a bit indicating whether a given position has been occupied private class Table { private int numItems; private Object[] keys; private Object[] data; private boolean[] hasBeenUsed;.... }
33
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 33 Lecture 9: Searching The Hash Function We can use the built in hashCode() method that Java provides private int hash (Object key) { return Math.abs(key.hashCode()) % data.length; }
34
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 34 Lecture 9: Searching Calculating the Index // If found return value is index of key private int findIndex(Object key) { int count=0; int i=hash(key); while ((count < data.length) && (hasBeenUsed[i])) { if (key.equals(keys[i])) return i; i = nextIndex(i); count++; } return -1; }
35
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 35 Lecture 9: Searching Inserting an Item public Object put (Object key, Object element) { int index = findIndex(key); if (index != -1) { Object answer = data[index]; data[index] = element; return answer; } else if (numItems < data.length) {....
36
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 36 Lecture 9: Searching Inserting an Item public Object put (Object key, Object element) {... else if (numItems < data.length) { index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; numItems++; return null; } else throw new IllegalStateException(“Table full”)....
37
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 37 Lecture 9: Searching Two Hashes are Better than One Collisions can result in long stretches of positions with keys not in their “preferred” position This is called clustering To address this problem, when a collision results we jump a “random” number of positions, using a second hash function
38
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 38 Lecture 9: Searching Double Hashing Find the first position using hash1(key) If there’s a collision, step through the array in steps of size hash2(key) : i = (i + hash2(key)) % data.length To avoid cycles, hash2(key) and the length of the array must be relatively prime (no common factors)
39
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 39 Lecture 9: Searching Double Hashing Knuth’s technique to avoid cycles: Choose the length of the array so that both data.length and data.length-2 are prime hash1(key) = Math.abs(key.hashCode()) % length hash2(key) = 1 + (Math.abs(key.hashCode()) % (length-1)
40
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 40 Lecture 9: Searching Issues with O-A Hashing Each array cell holds only one element Collisions and clustering can degrade performance Once the array is full, no more elements can be added, unless we: – create a new array with the right size and hash functions –re-hash the original elements
41
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 41 Lecture 9: Searching Chained Hashing Each array cell can hold more than one element of the hash table Hash the key of each element to obtain the array index When a collision happens, the element is still placed at the original hash index How is this handled?
42
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 42 Lecture 9: Searching Answer Each array location must be implemented with a data structure that can hold a group of elements with the same hash index Most common approach –each array location stores the head of a linked list –items in the list all have the same has index
43
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 43 Lecture 9: Searching Chained Hashing table … [0][1][2][3] element key link element key link element key link element key link Any number of elements can be added to the table without a need to rehash
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.