Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.

Similar presentations


Presentation on theme: "1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type."— Presentation transcript:

1 1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type is a pair  all items stored in the array  chained hash table  element type is a pointer to a linked list of nodes containing pairs  items are stored in the linked list nodes  keys are used to generate an array index  home address (0.. Tsize-1)

2 2 faster searching  "balanced" search trees guarantee O(log 2 n) search path by controlling height of the search tree  AVL tree  2-3-4 tree  red-black tree (used by STL associative container classes)  hash table allows for O(1) search performance  search time does not increase as n increases

3 3 Considerations  How big an array?  load factor of a hash table is n/Tsize  Hash function to use?  int hash(KeyType key) // 0.. Tsize-1  Collision resolution strategy?  hash function is many-to-one

4 4 Hash Function  a hash function is used to map a key to an array index (home address)  search starts from here  insert, retrieve, update, delete all start by applying the hash function to the key

5 5 Some hash functions  if KeyType is int - key % TSize  if KeyType is a string - convert to an integer and then % Tsize  goals for a hash function  fast to compute  even distribution  cannot guarantee no collisions unless all key values are known in advance

6 6 An Open Hash Table key value Hash (key) produces an index in the range 0 to 6. That index is the “home address” 01234560123456 Some insertions: K1 --> 3 K2 --> 5 K3 --> 2 K1 K1info K2 K2info K3 K3info

7 7 Handling Collisions 01234560123456 K3 K3info K1 K1info K2 K2info Some more insertions: K4 --> 3 K5 --> 2 K6 --> 4 K4 K4info K5 K5info K6 K6info Linear probing collision resolution strategy

8 8 Search Performance 01234560123456 K3 K3info K1 K1info K2 K2info K4 K4info K5 K5info K6 K6info Average number of probes needed to retrieve the value with key K? K hash(K) #probes K1 3 1 K2 5 1 K3 2 1 K4 3 2 K5 2 5 K6 4 4 14/6 = 2.33 (successful) unsuccessful search?

9 9 A Chained Hash Table insert keys: K1 --> 3 K2 --> 5 K3 --> 2 K4 --> 3 K5 --> 2 K6 --> 4 linked lists of synonyms 01234560123456 K3 K3info K1 K1info K5 K5info K4 K4info K6 K6info K2 K2info

10 10 Search Performance Average number of probes needed to retrieve the value with key K? K hash(K) #probes K1 3 1 K2 5 1 K3 2 1 K4 3 2 K5 2 2 K6 4 1 8/6 = 1.33 (successful) 01234560123456 K3 K3info K1 K1info K5 K5info K4 K4info K6 K6info K2 K2info unsuccessful search?

11 11 successful search performance open addressing open addressing chaining (linear probing) (double hashing) load factor 0.51.501.39 1.25 0.72.171.72 1.35 0.9 5.502.56 1.45 1.0 ---- ---- 1.50 2.0 ---- ---- 2.00

12 12 Factors affecting Search Performance  quality of hash function  how uniform?  depends on actual data  collision resolution strategy used  load factor of the HashTable  N/Tsize  the lower the load factor the better the search performance

13 13 Traversal  Visit each item in the hash table  Open hash table  O(Tsize) to visit all n items  Tsize is larger than n  Chained hash table  O(Tsize + n) to visit all n items  Items are not visited in order of key value

14 14 Deletions?  search for item to be deleted  chained hash table  find node and delete it  open hash table  must mark vacated spot as “deleted”  is different than “never used”

15 15 Hash Table Summary  search speed depends on load factor and quality of hash function  should be less than.75 for open addressing  can be more than 1 for chaining  items not kept sorted by key  very good for fast access to unordered data with known upper bound  to pick a good TSize

16 16 heap  is a binary tree that  is complete  has the heap-order property  max heap - item stored in each node has a key/priority that is >= the priority of the items stored in each of its children  min heap - item stored in each node has a key/priority that is <= the priority of the items stored in each of its children  efficient data structure for PriorityQueue ADT  requires the ability to compare items based on their priorities  basis for the heapsort algorithm

17 17 two heaps 23 18 9 8 12 7 1 4 2 A heap is always a complete binary tree 1 4 2 9 8 7 18 23 12

18 18 a complete binary tree can be stored in an array 23 18 9 8 12 7 1 4 2 for the item in A[i]: leftChild is in A[2i+1] rightChild is in A[2i+2] parent is in A[(i-1)/2] 0 1 2 3 4 5 6 7 8 23 18 9 8 12 7 1 4 2 A 9 Size

19 19 PriorityQueue ADT  Data Items  a collection of items which can be ordered by priority  Operations  constructor - creates an empty PQ  empty () - returns true iff a PQ is empty  size () - returns the number of items in a PQ  push (item) - adds an item to a PQ  top () - returns the item in a PQ with the highest priority  pop () – removes the item with the highest priority from a PQ

20 20 PQ Data structures  unordered array or linked list  push is O(1)  top and pop are (n)  ordered array or linked list  push is O(n)  top and pop are (1)  heap  top is O(1)  push and pop are O(log 2 n)  STL has a priority_queue class  is implemented using a heap

21 21 PQ operations  top  return item at A[0]  push and pop must maintain heap-order property  push  put new item at end (in A[size])  re-establish the heap-order property by moving the new item to where it belongs  pop  A[0] is item to delete  swap A[0] and A[size-1]  move item at A[0] down a path to where it belongs

22 22 pop( ) 0 1 2 3 4 5 6 7 8 23 18 9 8 12 7 1 4 2 A 9 Size 23 18 9 8 12 7 1 4 2 18 12 9 8 2 7 1 4 18 12 2 23 8

23 23 Balanced Search Trees  several varieties (Ch.13)  AVL trees  2-3-4 trees  Red-Black trees  B-Trees (used for searching secondary memory)  nodes are added and deleted so that the height of the tree is kept under control  insert and delete take more work, but retrieval (also insert & delete) never more than log 2 n because height is controlled


Download ppt "1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type."

Similar presentations


Ads by Google