Download presentation
Presentation is loading. Please wait.
1
1 Microsoft Imagine Cup http://www.thespoke.net/imagine john@johndowns.co.nz
2
2 CompSci 105 SS 2005 Principles of Computer Science Lecture 24: Tables and Hashing
3
3 Tables What is a table??
4
4 List ADT createTable() isEmpty() tableLength() tableInsert(item) tablDelete(searchKey) tableRetrieve(searchKey) tableTraverse()
5
5 Search Key It is important that the search key remain the same as long as the item is stored in the table. public abstract class KeyedItem { private Comparable searchKey; Public KeyItem(comparable key) { searchKey = key; } // end constructor public Comparable getKey() { return searchKey; } // end getKey } // end KeyedItem
6
6 Implementation?? Implementations for the ADT Table –Linear approaches Unsorted, array based Unsorted, reference based Sorted (by search key), array based Sorted (by search key), reference based –Non-linear approach Binary Search Tree The requirements of a particular application influence the selection of an implementation –What operations and how often they are used
7
7 Which to use??
8
8 ADT Table Unsorted Array Binary Search Tree ADT Table Program that uses a table Textbook, p. 504-517
9
9 Databases Relational databases are simply a set of tables filled with data Use a variety of methods to store/retrieve that data
10
10 Hash Tables
11
11 Wouldn’t it be nice? Table ADT Key can be used as an array index IDSurnameFirst Name 1EksepshenCatchda 3BaseeksBeegoh 0HeadnodeDummy 2GettingsoonAyplus
12
12 Wouldn’t it be nice? Table ADT Key can be used as an array index IDSurnameFirst Name 1EksepshenCatchda 3BaseeksBeegoh 0HeadnodeDummy 2GettingsoonAyplus 0Headnode, Dummy 1Eksepshen, Catchda 2 3Baseeks, Beegoh
13
13 The Problem IDSurnameFirst Name 9978291EksepshenCatchda 3024817BaseeksBeegoh 3423930HeadnodeDummy 2048171GettingsoonAyplus ID range is far greater than can or should be stored...
14
14 The Problem General case is where we have a large range of possible keys/values, but are only storing a small number of items How do we distribute items in a smaller space?
15
15 Naive Solution If we have N possible search key values and M locations Simply divide N into M lots: e.g. N=1-1000, M=10 1-100 100-200200-300300-400400-500500-600600-700700-800800-900 900-1000 0413256789
16
16 What about collisions? If we want to store two items with search key 150 and 160, they will collide in the same array point 1-100 100-200200-300300-400400-500500-600600-700700-800800-900 900-1000 0413256789
17
17 Hash Functions 0 1 2 3 4 5 6 7 8 9 ? 3423930 Hash Table Hash Function
18
18 A Hash Function 0 1 2 3 4 5 6 7 8 9 ID % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table
19
19 Collision 0 Headnode 1 2 3 4 5 6 7 Baseeks 8 9 key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table Eksepshen Gettingsoon
20
20 Hash Function Tricks 0 1 2 3 4 5 6 7 8 9 ? 3423930
21
21 Requirements of Hash Functions Don’t produce values outside of array Distribute items as evenly as possible Use all available space in array to minimise collision
22
22 Selecting Digits Digits 3 and 5 Hash Function How big do we need? 3423930 29
23
23 Folding Digits Sum of all digits Hash Function How big do we need? 3423930 3+4+2+3+9+ 3+0 = 24
24
24 Folding Digits Group and add digits Hash Function How big do we need? 3423930 342+393+0 = 735
25
25 Handling Characters Sum of Unicodes Hash Function “Catchda” Fold these as well?
26
26 Are these any good?? Do they even distribute values? No mention of array size?
27
27 Modulo Arithmetic % tablesize Hash Function 0 1 2 3 4 5 6 7 8 9 3423930
28
28 Modulo Arithmetic % tablesize Hash Function 0 1 2 3 4 5 6 3423930
29
29 Hash Functions Can combine multiple Hash functions into one Combine folding with modulus
30
30 Solutions to Collision?? All methods will result in collision There are many solutions....
31
31 Separate Chaining 0 1 2 3 4 5 6 7 8 9 key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table GettingsoonEksepshen Headnode Baseeks
32
32 Separate Chaining Could use ANY of the data structures so far Search time is reduced, but extra data structures required Can’t we just use array??
33
33 Linear Probing key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table 0 Headnode 1 Eksepshen 2 3 4 5 6 7Baseeks 8 9 Clustering
34
34 Linear Probing key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon 3153010Elizabeth II Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 4 5 6 7 Baseeks 8 9 Clustering
35
35 Linear Probing key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon 3153010Elizabeth II Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 Elizabeth II 4 5 6 7 Baseeks 8 9 Clustering
36
36 Finding a Node
37
37 key % 10 Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 Elizabeth II 4 5 6 7 Baseeks 8 9 Problem: Find item with key 9978291 Solution: Search as if we were ADDING the item, checking each place we come across Stop if found or reach null Finding a Node
38
38 Efficiency?? What is the efficiency of these operations?? What is it dependant upon?? When is it best/worst??
39
39 Efficiency
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.