Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching: Hash Tables

Similar presentations


Presentation on theme: "Searching: Hash Tables"— Presentation transcript:

1 Searching: Hash Tables
Chapter 12 6/9/15 Adapted from instructor resource slides Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

2 Info: Still grading exams. Will review answers Thursday
Review how to handle issues with project grading Review Thursday’s material Hashing (new) break Start sorting Review project 2 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

3 Evolution of Reusability, Genericity
Major theme in development of programming languages Reuse code Avoid repeatedly reinventing the wheel Trend contributing to this Use of generic code Can be used with different types of data Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

4 Function Genericity Overloading and Templates
Initially code was reusable by encapsulating it within functions Example lines of code to swap values stored in two variables Instead of rewriting those 3 lines Place in a function void swap (int & first, int & second) { int temp = first; first = second; second = temp; } Then call swap(x,y); Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

5 Template Mechanism Declare a type parameter
also called a type placeholder Use it in the function instead of a specific type. This requires a different kind of parameter list: void Swap(______ & first, ______ & second) { ________ temp = first; first = second; second = temp; } Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

6 Instantiating Class Templates
Instantiate it by using declaration of form ClassName<Type> object; Passes Type as an argument to the class template definition. Examples: Stack<int> intSt; Stack<string> stringSt; Compiler will generate two distinct definitions of Stack two instances one for ints and one for strings. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

7 STL (Standard Template Library)
A library of class and function templates Components: Containers: Generic "off-the-shelf" class templates for storing collections of data Algorithms: Generic "off-the-shelf" function templates for operating on containers Iterators: Generalized "smart" pointers that allow algorithms to operate on almost any container Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

8 The vector Container A type-independent pattern for an array class
capacity can expand self contained Declaration template <typename T> class vector { } ; Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

9 vector Operations Information about a vector's contents
v.size() v.empty() v.capacity() v.reserve() Adding, removing, accessing elements v.push_back() v.pop_back() v.front() v.back() What is difference between size and capacity? What is reserve? Allows to set or increase, but not decrease Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

10 Increasing Capacity of a Vector
When vector v becomes full capacity increased automatically when item added Algorithm to increase capacity of vector<T> Allocate new array to store vector's elements use T copy constructor to copy existing elements to new array Store item being added in new array Destroy old array in vector<T> Make new array the vector<T>'s storage array Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

11 Iterators Each STL container declares an iterator type
can be used to define iterator objects Iterators are a generalization of pointers that allow a C++ program to work with different data structures (containers) in a uniform manner To declare an iterator object the identifier iterator must be preceded by name of container scope operator :: Example: vector<int>::iterator vecIter = v.begin() Would define vecIter as an iterator positioned at the first element of v Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

12 Iterators Contrast use of subscript vs. use of iterator
ostream & operator<<(ostream & out, const vector<double> & v) { for (int i = 0; i < v.size(); i++) out << v[i] << " "; return out; } for (vector<double>::iterator it = v.begin(); it != v.end(); it++) out << *it << " "; Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

13 Iterator Functions Note Table 9-5
Note the capability of the last two groupings Possible to insert, erase elements of a vector anywhere in the vector Must use iterators to do this Note also these operations are as inefficient as for arrays due to the shifting required Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

14 Contrast Vectors and Arrays
Capacity can increase A self contained object Is a class template (No specific type) Has function members to do tasks Fixed size, cannot be changed during execution Cannot "operate" on itself Bound to specific type Must "re-invent the wheel" for most actions Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

15 STL's deque Class Template
Has the same operations as vector<T> except … there is no capacity() and no reserve() Has two new operations: d.push_front(value); Push copy of value at front of d d.pop_front(value); Remove value at the front of d Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

16 vector vs. deque vector deque Capacity of a vector must be increased
It must copy the objects from the old vector to the new vector It must destroy each object in the old vector A lot of overhead! With deque this copying, creating, and destroying is avoided. Once an object is constructed, it can stay in the same memory locations as long as it exists If insertions and deletions take place at the ends of the deque. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

17 vector vs. deque Unlike vectors, a deque isn't stored in a single varying-sized block of memory, but rather in a collection of fixed-size blocks (typically, 4K bytes). One of its data members is essentially an array map whose elements point to the locations of these blocks. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

18 Linear Search Vector based search function template <typename t>
void LinearSearch (const vector<t> &v, const t &item, boolean &found, int &loc) { found = false; loc = 0; while(loc < n && !found) { if (found || loc == v.size()) return; if (item == x[loc]) found = true; else loc++; } } Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

19 Binary Search found = false; int first = 0;
Binary search function for vector template <typename t> void LinearSearch (const vector<t> &v, const t &item, boolean &found, int &loc) { found = false; int first = 0; int last = v.size() - 1; while(first <= last && !found) if (found || first > last) return; loc = (first + last) / 2; if (item < v[loc]) last = loc + 1; else if (item > v[loc]) first = loc + 1; else /* item == v[loc] */ found = true; } } NEED TO HAVE AN ORDERED, OR SORTED LIST!!! Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

20 Binary Search Usually outperforms a linear search Disadvantage:
Requires a sequential storage Not appropriate for linked lists (Why?) It is possible to use a linked structure which can be searched in a binary-like manner Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

21 Trees Tree terminology Root node Children of the parent (3)
Siblings to each other Leaf nodes Root, children, parent, ancestor, descendant, etc Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

22 Binary Trees Each node has at most two children
Useful in modeling processes where a comparison or experiment has exactly two possible outcomes the test is performed repeatedly Example multiple coin tosses encoding/decoding messages in dots and dashes such as Morse code Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

23 Binary Trees Each node has at most two children
Useful in modeling processes where a comparison or experiment has exactly two possible outcomes the test is performed repeatedly Example multiple coin tosses encoding/decoding messages in dots and dashes such as Morse code Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

24 Array Representation of Binary Trees
Works OK for complete trees, not for sparse trees Balanced trees are binary trees with the property that for each node, the height of its left subtree and the height of its right subtree differ by at most one, where a tree’s height is the number of levels in the tree. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

25 Linked Representation of Binary Trees
Uses space more efficiently Provides additional flexibility Each node has two links one to the left child of the node one to the right child of the node if no child node exists for a node, the link is set to NULL Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

26 Binary Trees as Recursive Data Structures
A binary tree is either empty … or Consists of a node called the root root has pointers to two disjoint binary (sub)trees called … right (sub)tree left (sub)tree Anchor Inductive step Which is either empty … or … Which is either empty … or … Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

27 ADT Binary Search Tree (BST)
Collection of Data Elements binary tree each node x, value in left child of x value in x in right child of x Basic operations Construct an empty BST Determine if BST is empty Search BST for given item Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

28 ADT Binary Search Tree (BST)
Basic operations (ctd) Insert a new item in the BST Maintain the BST property Delete an item from the BST Traverse the BST Visit each node exactly once The inorder traversal must visit the values in the nodes in ascending order View BST class template, Fig. 12-1 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

29 BST Traversals Note that recursive calls must be made
To left subtree To right subtree Must use two functions Public method to send message to BST object Private auxiliary method that can access BinNodes and pointers within these nodes Similar solution to graphic output Public graphic method Private graphAux method Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

30 BST Searches Search begins at root
If that is desired item, done If item is less, move down left subtree If item searched for is greater, move down right subtree If item is not found, we will run into an empty subtree View search() Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

31 Inserting into a BST Insert function View insert() function
Uses modified version of search to locate insertion location or already existing item Pointer parent trails search pointer locptr, keeps track of parent node Thus new node can be attached to BST in proper place View insert() function R Looking to insert ‘R’ Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

32 Recursive Deletion Three possible cases to delete a node, x, from a BST 1. The node, x, is a leaf Make parent pointer a null pointer, In deleting D, make right pointer in parent C a null. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

33 Recursive Deletion 2. The node, x has one child
In deleting E, set right pointer of parent A to point at C Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

34 Recursive Deletion x has two children
Delete node pointed to by xSucc as described for cases 1 and 2 Replace contents of x with inorder successor Want to delete J. To replace with inorder successor, go right and then descent left as far as possible: K, then delete xSucc(original K) K Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

35 Problem of Lopsidedness
Trees can be totally lopsided Suppose each node has a right child only Degenerates into a linked list Processing time affected by "shape" of tree Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

36 Hash Tables Recall order of magnitude of searches Linear search O(n)
Binary search O(log2n) Balanced binary tree search O(log2n) Unbalanced binary tree can degrade to O(n) Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

37 Hash Tables In some situations faster search is needed
Solution is to use a hash function Value of key field given to hash function Location in a hash table is calculated Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

38 Hash Functions Simple function could be to mod the value of the key by the size of the table H(x) = x % tableSize Note that we have traded speed for wasted space Table must be considerably larger than number of items anticipated Suggested to be 1.5-2x larger Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

39 Hash Functions Observe the problem with same value returned by h(x) for different values of x Called collisions A simple solution is linear probing Empty slots marked with -1 Linear search begins at collision location Continues until empty slot found for insertion Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

40 Hash Functions When retrieving a value linear probe until found
If empty slot encountered then value is not in table If deletions permitted Slot can be marked so it will not be empty and cause an invalid linear probe Ex. -1 for unused slots, -2 for slots which used to contain data Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

41 Collision Reduction Strategies
Strategies for improved performance Increase table capacity (less collisions) Use different collision resolution technique Devise different hash function Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

42 Collision Reduction Strategies
Hash table capacity Size of table must be 1.5 to 2 times the size of the number of items to be stored Otherwise probability of collisions is too high Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

43 Collision Reduction Strategies
Linear probing can result in primary clustering Consider quadratic probing Probe sequence from location i is i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, … Secondary clusters can still form Double hashing Use a second hash function to determine probe sequence Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

44 Collision Reduction Strategies
Chaining Table is a list or vector of head nodes to linked lists When item hashes to location, it is added to that linked list Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved

45 Improving the Hash Function
Ideal hash function Simple to evaluate Scatters items uniformly throughout table Modulo arithmetic not so good for strings Possible to manipulate numeric (ASCII) value of first and last characters of a name Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved


Download ppt "Searching: Hash Tables"

Similar presentations


Ads by Google