Download presentation
Presentation is loading. Please wait.
1
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing
2
2 Sorted Linear Lists For formula-based implementation –Insert: O(n)comps & data moves –Delete: O(n)comps & data moves –Search: O(log(n)) comps For chained implementation: –Insert: O(n)comps –Delete: O(n)comps –Search: O(n)comps
3
3 Sorted Chain
4
4
5
5
6
6
7
7 Dictionary A dictionary is a collection of elements, each element has a field called key. Key is unique for each element Operations: –Insert an element with a specified key value –Search the dictionary for an element with a specified key value –delete an element with a specified key value The access mode for elements in a dictionary is random access (or direct access) mode: i.e. any element may be retrieved by performing a search on its key.
8
8 Dictionary
9
9 Ideal hashing Hash table: table used to store elements Hash function: function to map keys to positions: k => f(k) Search for an element with key k: if f(k) is not empty, found; otherwise, failed Insert: f(k) must be empty Delete: f(k) cannot be empty
10
10 Example: Student record dictionary Use student ID (6 digit number) as the key ID range 951000 and 952000 f(k) = k - 951000 Table size: 1001 i.e. ht[0..1000] ht[i].key = 0 indicates an empty entry
11
11 Evaluation: Ideal Hashing Initialize an empty dictionary: Θ(b) where b is the size of the table Search, insert, and delete: Θ(1) Property: 1 key 1 position Problem: the range of the keys may be very large resulting in large hash table, e.g. if the key is a 9 digit integer (ex SSN), the size of the table will be 10 9
12
12 Hashing with linear open addressing Used when the size of the hash table (D) is smaller than the key range f(k) = k % D Positions in hash table are indexed 0..D-1 bucket - position in a hash table If key values are not integral type, they need to be converted first. two keys k1 and k2 map into the same bucket if f(k1) = f(k2) collision home bucket - position numbered f(k) is the home bucket for k In general a bucket may contain space for more than one element. An overflow occurs if there is not room in the home bucket for the new element. If bucket has space for only one element, collision and overflow are the same.
13
13 Collision, overflow and linear open addressing 80, 58, &35 map into home bucket ht(3). In case of collision, insert in next available bucket in sequence.
14
14 Search To search for an element with key k, begin at bucket f(k) and continue in successive bucket regarding the table as circular, until: –a bucket containing an element with k is found (successful) –an empty bucket is reached (unsuccessful) –return to the home bucket (unsuccessful)
15
15 deletion After deletion, must move successive elements until: –am empty bucket is reached –return to the bucket from which the deletion took place To improve performance, use a NeverUsed field. May need reorganization when many buckets have their NeverUsed field set to false
16
16 Class definition
17
17 Constructor
18
18 hSearch
19
19 Search
20
20 Insert
21
21 Performance analysis b - the number of buckets in the hush table, b = D initialization - Θ(b) worst-case insert and search - Θ(n), where n is the number of elements in the table worst-case happens when all n keys have the same home bucket
22
22 Performance analysis (continue) Average performance Let α=n/b denote the loading factor U n and S n - average number of buckets examined during and unsuccessful and successful search, respectively, then
23
23 Performance analysis (continue) The performance of hashing with linear open addressing is superior : –when α=0.5 table is half full U n =2.5 and S n =1.5 –when α=0.9table is 90% full U n =50.5 and S n =5.5
24
24 Determining D either a prime number or has no prime factors less than 20 two methods: –begin with the largest possible value for b –Then find the largest D (<= b) that is either a prime or has no factors smaller than 20 –e.g., when b = 530, then D = 23*23 = 529
25
25 Determining D Second method: –determine your accepted U n and S n –Estimate n –determine α –determine smallest b for the above α –determine smallest integer D >= b that either prime or has no factor smaller than 20.
26
26 Determining D n = 1000 S 4 and U 50.5 –S = 4 ==> α = 6/7 –U = 50.05 ==> α = 0.9 –α = min(6/7, 0.9) = 6/7 –b = n/ α = 7000/6 = 1167 –note: 23*51 = 1173 ==> select D = b = 1173
27
27 Hashing with Chains
28
28 Implementations
29
29 An improved implementation
30
30 Comparison with Linear Open Addressing Space complexity –Let s be the space required by an element –Let b and n denote the number of buckets and number of elements, respectively –Linear open addressing: b(s+2) bytes (2 for an element of empty array) –chaining: 2b+2n+ns bytes –when n < bs/(s+2), chaining takes less space
31
31 Search time complexity Worst-case time complexity= n occurs when all elements map to same bucket (equal to that of linear open addressing) Average –average length of a chain is α=n/b –average number of nodes examined in an unsuccessful search: * if chain has i nodes, it may take 1, 2, 3, …,I examinations. Assuming equal probability, on average search time =
32
32 Search time complexity Ctnd If α=0, U n =0 If α<1, U n <= α If α>=1,
33
33 Average time complexity for successful search Need to know the expected distance of each of the n elements from the head of its chain Without losing generality, we assume elements are inserted into the chain in increasing order When the ith element is inserted, the expected length of the chain is (i-1)/b; and the ith element is added into the end of the chain A search for this element will require examination of 1+(i-1)/b nodes Assuming n elements are searched for with equal probability, then
34
34 Comparison with linear open addressing The expected performance of chaining is superior, e.g., –when α=0.9 –Chaining: U n =0.9, S n =1.45 –Linear open addressing: U n =50.5, S n =5.5
35
35 Skip Lists
36
36 20243040807560 A sorted chain with head and tail nodes 20243040807560 Pointers to middle are added
37
37 20243040807560 Pointers to every second node
38
38
39
39 Skip List Implementation
40
40
41
41
42
42
43
43
44
44
45
45
46
46
47
47
48
48
49
49
50
50 An application Text compression –compressor: file coding run-length coding: 1000 xs + 2000 ys => 1000x2000y space needed: 3002 bytes (2 bytes for string ends) => 12 bytes –decompressor: decoding LZW Compression (Lempel, Ziv, and Welch)
51
51 LZW Compression Try aaabbbbbbaabaaba encoded as: 0214537
52
52 Input/Output
53
53 Input/Output (continue)
54
54 Dictionary organization Use code to represent the prefix of key
55
55 Dictionary organization (continue) assume each code is 12 bits long. Hence there are at most 2 12 =4096 codes Use hash table with divisor D = 4099 ChainHashTable h(D)
56
56 Output of codes
57
57 Compression
58
58 Compression (continue)
59
59 Compression (continue)
60
60 Headers and Function main
61
61 Headers and Function main (continue)
62
62 LZW Decompression The dictionary is searched for an entry with a given code The first code in the compressed file corresponds to a single character For all other codes p: –Case 1: p is in the dictionary –Case 2: p is not in the dictionary If q is the code that precedes p in the compressed file, then pair (next code, test(q)fc(p)) is entered into dictionary, where f c (p) is the first character of text(p). This can only happen when text(p) = text(q)f c (q) and the current text segment is text(q)text(q)f c (q)
63
63 Try Decode 0214537 the result should be aaabbbbbbaabaaba
64
64 Input/Output
65
65 Input/Output (continue)
66
66 Dictionary organization
67
67 Input of Code
68
68 Decompression
69
69 Decompression (continue)
70
70 Headers and Function main
71
71 Headers and Function main (continue)
72
72 End of Chapter 7
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.