Presentation is loading. Please wait.

Presentation is loading. Please wait.

HASHING CSC 172 SPRING 2002 LECTURE 22. Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket.

Similar presentations


Presentation on theme: "HASHING CSC 172 SPRING 2002 LECTURE 22. Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket."— Presentation transcript:

1 HASHING CSC 172 SPRING 2002 LECTURE 22

2 Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket contains a list of set elements B = number of buckets A hash function that takes potential set elements and produces a “random” integer [0..B-1]

3 Example If the set elements are integers then the simplest/best hash function is usually h(x) = x % B Suppose B = 6 and we wish to store the integers {70, 53, 99, 94, 83, 76, 64, 30} They belong in the buckets 4, 5, 3, 4, 5, 4, 4, and 0 Note: If B = 7 0,4,1,3,6,6,1,2

4 Pitfalls of Hash Function Selection We want to get a uniform distribution of elements into buckets Beware of data patterns that cause non-uniform distribution

5 Example If integers were all even, then B = 6 would cause only bucktes 0,2, and 4 to fill If we hashed words in the the UNIX dictionary into 10 buckets by length of word then 20% go into bucket 7

6 Dictionary Operations Lookup Go to head of bucket h(x) Search for bucket list. If x is in the bucket Insertion: append if not found Delete – list deletion from bucket list

7 Analysis If we pick B to be new n, the nubmer of elements in the set, then the average list is O(1) long Thus, dictionary ops take O(1) time Worst case all elements go into one bucket O(n)

8 Managing Hash Table Size If n gets as high as 2B, create a new hash table with 2B buckets “Rehash” every element into the new table O(n) time total There were at least n inserts since the last “rehash” All these inserts took time O(n) Thus, we “amortize” the cost of rehashing over the inserts since the last rehash Constant factor, at worst So, even with rehashing we get O(1) time ops

9 Collisions A collision occurs when two values in the set hash to the same value There are several ways to deal with this Chaining (using a linked list or some secondary structure) Open Addressing Double hashing Linear Probing

10 Chaining 0 1 2 3 4 5 6 70  9964   8376  94  53  30  Very efficient Time Wise Other approaches Use less space

11 Open Addressing When a collision occurs, if the table is not full find an available space Linear Probing Double Hashing

12 Linear Probing If the current location is occupied, try the next table location LinearProbingInsert(K) { if (table is full) error; probe = h(K); while (table[probe] is occupied) probe = ++probe % M; table[probe] = K; } Walk along table until an empty spot is found Uses less memory than chaining (no links) Takes more time than chaining (long walks) Deleting is a pain (mark a slot as having been deleted)

13 Linear Probing h(K) = K % 13 18 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5,

14 Linear Probing h(K) = K % 13 4118 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2,

15 Linear Probing h(K) = K % 13 411822 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9,

16 Linear Probing h(K) = K % 13 41185922 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7,

17 Linear Probing h(K) = K % 13 4118325922 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6,

18 Linear Probing h(K) = K % 13 4118325922 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

19 Linear Probing h(K) = K % 13 4118325922 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

20 Linear Probing h(K) = K % 13 4118325922 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

21 Linear Probing h(K) = K % 13 411832593122 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

22 Linear Probing h(K) = K % 13 411832593122 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5, 8

23 Linear Probing h(K) = K % 13 411832593122 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5, 8 73

24 Double Hashing If the current location is occupied, try another table location Use two hash functions If M is prime, eventually will examine every location DoubleHashInsert(K) { if (table is full) error; probe = h1(K); offset = h2(K); while (table[probe] is occupied) probe = (probe+offset) % M; table[probe] = K; } Many of the same (dis)advantages as linear probing Distributes keys more evenly than linear probing

25 Double Hashing h1(K) = K % 13 h1(K) = 8 - K % 8 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7

26 Double Hashing h1(K) = K % 13 h1(K) = 8 - K % 8 4118325922 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7 31

27 Double Hashing h1(K) = K % 13 h1(K) = 8 - K % 8 4118325922 0123456789101112 Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7 3173

28 Implementing Hash Tables public class HashMap implements Map { private transient Entry table[]; private transient int count; ….

29 Implementing Hash Tables public class HashMap implements Map { ……. public HashMap(int initialCapacity, float loadFactor) public HashMap(int initialCapacity) public boolean containsValue(Object value) public boolean containsKey(Object key) public Object get(Object key) public Object put(Object value, Object key) public Object remove (Object key)

30 Constructor public HashMap(int initialCapacity, float loadFactor){ if (initialCapacity < 0) throw new IllegalArgumentException( “Illegal InitialCapacity “ + initalCapacity); if (loadFactor <= 0) throw new IllegalArgumentException( “Illegal loadFactor “ + loadFactor); if (initalCapacity == 0) initalCapacity = 1; this.loadFactor = loadFactor; table = new Entry[initialCapacity]; threshold = (int)(initialCapacity * loadFactor); }// constructor

31 containsKey() public boolean containsKey(Object key){ Entry tab[] = table; if (key != null) { int hash = key.hashCode(); int index = (hash & 0x7FFFFFFF)% tab.length; for (Entry e = tab[index];e!=null;e=e.next) if (e.hash == hash && key.equals(e.key)) return true; } else { for (Entry e = tab[index];e!=null;e=e.next) if (e.hash == null) return true; } return false; }// method containsKey

32 put() public Object put(Object key, Object value){ Entry tab[] = table;int hash = 0; int index = 0; if (key != null) { hash = key.hashCode(); index = (hash & 0x7FFFFFFF)% tab.length; for (Entry e = tab[index];e!=null;e=e.next) if (e.hash == hash && key.equals(e.key)){ Object old = e.value; e.value = value; return old; }

33 put() else { for (Entry e = tab[0];e!=null;e=e.next){ if (e.key == null){ Object old = e.value; e.value = value; return old; } }// key == null

34 put() modCount++; if (count >= threshold) { rehash(); tab = table; index =(hash & 0x7FFFFFFF)% tab.length; } Entry e = new Entry(hash,key,value,tab[index]); tab[index] = e; count++; return null; }//method put

35 rehash() private void rehash(){ int oldCapacity = table.length; Entry oldMap[] = table; int newCapacity = oldCapacity * 2 + 1; Entry newMap[] = new Entry[newCapacity]; modCount++; threshold = (int)(newCapacity * loadFactor); table = newMap; for (int I = olcCapacity;I  0;) { for (Entry old = oldMap[i];old!=null;){ Entry e = old; old = old.next; int index =(e.hash & 0x7FFFFFFF)% newCapacity; e.next = newMap[index]; newMap[index] = e; }

36 remove() public Object remove(Object key){ Entry tab[] = table; if (key != null) { int hash = key.hashCode(); int index = (hash & 0x7FFFFFFF)% tab.length; for (Entry e = tab[index],prev = null;e!=null;prev=e,e=e.next) if (e.hash == hash && key.equals(e.key)){ modCount++; if (prev != null) prev.next = e.next; else tab[index] = e.next; count--; Object oldValue = e.value; e.value = null; return oldValue; }

37 remove() else { for (Entry e = tab[0],prev = null;e!=null;prev=e,e=e.next){ if (e.key == null){ modCount++; if (prev != null) prev.next = e.next; else tab[0] = e.next; count--; Object oldValue = e.value; e.value = null; return oldValue; } return null; }

38 Theoretical Results Not FoundFound Chaining Linear Probing Double Hashing

39 Expected Probes 0.5 1.0 Linear Probing Double Hashing Chaining


Download ppt "HASHING CSC 172 SPRING 2002 LECTURE 22. Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket."

Similar presentations


Ads by Google