Searching: Hash Tables

Slides:



Advertisements
Similar presentations
Searching: Binary Trees and Hash Tables
Advertisements

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Chapter 4: Trees Part II - AVL Tree
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
Standard Containers: Vectors
Gordon College Prof. Brinton
Chapter 12 C Data Structures Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 12 – Data Structures Outline 12.1Introduction.
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
Starting Out with C++: Early Objects 5/e © 2006 Pearson Education. All Rights Reserved Starting Out with C++: Early Objects 5 th Edition Chapter 19 Binary.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved ADT Implementations:
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Trees Chapter.
10. Binary Trees A. Introduction: Searching a linked list.
Binary Search Trees Chapter 6.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Dr. Yingwu Zhu STL Vector and Iterators. STL (Standard Template Library) 6:14:43 AM 2 A library of class and function templates Components: 1. Containers:
 2007 Pearson Education, Inc. All rights reserved C Data Structures.
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
Binary Trees Chapter 10. Introduction Previous chapter considered linked lists –nodes connected by two or more links We seek to organize data in a linked.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Searching:
Hashing Dr. Yingwu Zhu.
Tree (new ADT) Terminology:  A tree is a collection of elements (nodes)  Each node may have 0 or more successors (called children)  How many does a.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved ADT Implementations:
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Trees Chapter.
1 10. Binary Trees Read Sec A. Introduction: Searching a linked list. 1. Linear Search /* Linear search a list for a particular item */ 1. Set.
C++ How to Program, 9/e © by Pearson Education, Inc. All Rights Reserved.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Stacks.
Search: Binary Search Trees Dr. Yingwu Zhu. Linear Search Collection of data items to be searched is organized in a list x 1, x 2, … x n Assume == and.
Exam #2 Review. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Binary Trees Chapter 10. Introduction Previous chapter considered linked lists –nodes connected by two or more links We seek to organize data in a linked.
Search: Binary Search Trees Dr. Yingwu Zhu. Review: Linear Search Collection of data items to be searched is organized in a list x 1, x 2, … x n – Assume.
Final Exam Review COP4530.
Sections 10.5 – 10.6 Hashing.
C++ Programming:. Program Design Including
Chapter 12 – Data Structures
Searching and Binary Search Trees
C++ Templates.
Chapter 18 Introduction to Custom Templates
Linked Lists Chapter 6 Section 6.4 – 6.6
12 C Data Structures.
More Linking Up with Linked Lists
CSCI 210 Data Structures and Algorithms
Introduction to Custom Templates
Hashing Exercises.
abstract containers sequence/linear (1 to 1) hierarchical (1 to many)
ITEC 2620M Introduction to Data Structures
i206: Lecture 13: Recursion, continued Trees
Binary Trees Lecture 36 Wed, Apr 21, /21/2018 Binary Trees.
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Array Lists Chapter 6 Section 6.1 to 6.3
Searching: Binary Trees
ADT Implementations: Templates and Standard Containers
Templates and Standard Containers
Trees Chapter 15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved
Find in a linked list? first last 7  4  3  8 NULL
Final Exam Review COP4530.
Hash Tables Chapter 12.7 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time Nyhoff, ADTs, Data Structures and.
A Robust Data Structure
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Final Review Dr. Yingwu Zhu.
EE 312 Final Exam Review.
Heaps and priority queues
EECE.3220 Data Structures Instructor: Dr. Michael Geiger Spring 2019
Instructor: Dr. Michael Geiger Spring 2017 Lecture 33: Hash tables
Tree (new ADT) Terminology: A tree is a collection of elements (nodes)
Presentation transcript:

Searching: Hash Tables Chapter 12 6/9/15 Adapted from instructor resource slides Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Info: Still grading exams. Will review answers Thursday Review how to handle issues with project grading Review Thursday’s material Hashing (new) break Start sorting Review project 2 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Evolution of Reusability, Genericity Major theme in development of programming languages Reuse code Avoid repeatedly reinventing the wheel Trend contributing to this Use of generic code Can be used with different types of data Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Function Genericity Overloading and Templates Initially code was reusable by encapsulating it within functions Example lines of code to swap values stored in two variables Instead of rewriting those 3 lines Place in a function void swap (int & first, int & second) { int temp = first; first = second; second = temp; } Then call swap(x,y); Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Template Mechanism Declare a type parameter also called a type placeholder Use it in the function instead of a specific type. This requires a different kind of parameter list: void Swap(______ & first, ______ & second) { ________ temp = first; first = second; second = temp; } Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Instantiating Class Templates Instantiate it by using declaration of form ClassName<Type> object; Passes Type as an argument to the class template definition. Examples: Stack<int> intSt; Stack<string> stringSt; Compiler will generate two distinct definitions of Stack two instances one for ints and one for strings. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

STL (Standard Template Library) A library of class and function templates Components: Containers: Generic "off-the-shelf" class templates for storing collections of data Algorithms: Generic "off-the-shelf" function templates for operating on containers Iterators: Generalized "smart" pointers that allow algorithms to operate on almost any container Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

The vector Container A type-independent pattern for an array class capacity can expand self contained Declaration template <typename T> class vector { . . . } ; Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

vector Operations Information about a vector's contents v.size() v.empty() v.capacity() v.reserve() Adding, removing, accessing elements v.push_back() v.pop_back() v.front() v.back() What is difference between size and capacity? What is reserve? Allows to set or increase, but not decrease Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Increasing Capacity of a Vector When vector v becomes full capacity increased automatically when item added Algorithm to increase capacity of vector<T> Allocate new array to store vector's elements use T copy constructor to copy existing elements to new array Store item being added in new array Destroy old array in vector<T> Make new array the vector<T>'s storage array Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Iterators Each STL container declares an iterator type can be used to define iterator objects Iterators are a generalization of pointers that allow a C++ program to work with different data structures (containers) in a uniform manner To declare an iterator object the identifier iterator must be preceded by name of container scope operator :: Example: vector<int>::iterator vecIter = v.begin() Would define vecIter as an iterator positioned at the first element of v Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Iterators Contrast use of subscript vs. use of iterator ostream & operator<<(ostream & out, const vector<double> & v) { for (int i = 0; i < v.size(); i++) out << v[i] << " "; return out; } for (vector<double>::iterator it = v.begin(); it != v.end(); it++) out << *it << " "; Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Iterator Functions Note Table 9-5 Note the capability of the last two groupings Possible to insert, erase elements of a vector anywhere in the vector Must use iterators to do this Note also these operations are as inefficient as for arrays due to the shifting required Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Contrast Vectors and Arrays Capacity can increase A self contained object Is a class template (No specific type) Has function members to do tasks Fixed size, cannot be changed during execution Cannot "operate" on itself Bound to specific type Must "re-invent the wheel" for most actions Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

STL's deque Class Template Has the same operations as vector<T> except … there is no capacity() and no reserve() Has two new operations: d.push_front(value); Push copy of value at front of d d.pop_front(value); Remove value at the front of d Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

vector vs. deque vector deque Capacity of a vector must be increased It must copy the objects from the old vector to the new vector It must destroy each object in the old vector A lot of overhead! With deque this copying, creating, and destroying is avoided. Once an object is constructed, it can stay in the same memory locations as long as it exists If insertions and deletions take place at the ends of the deque. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

vector vs. deque Unlike vectors, a deque isn't stored in a single varying-sized block of memory, but rather in a collection of fixed-size blocks (typically, 4K bytes). One of its data members is essentially an array map whose elements point to the locations of these blocks. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Linear Search Vector based search function template <typename t> void LinearSearch (const vector<t> &v, const t &item, boolean &found, int &loc) { found = false; loc = 0; while(loc < n && !found) { if (found || loc == v.size()) return; if (item == x[loc]) found = true; else loc++; } } Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Binary Search found = false; int first = 0; Binary search function for vector template <typename t> void LinearSearch (const vector<t> &v, const t &item, boolean &found, int &loc) { found = false; int first = 0; int last = v.size() - 1; while(first <= last && !found) if (found || first > last) return; loc = (first + last) / 2; if (item < v[loc]) last = loc + 1; else if (item > v[loc]) first = loc + 1; else /* item == v[loc] */ found = true; } } NEED TO HAVE AN ORDERED, OR SORTED LIST!!! Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Binary Search Usually outperforms a linear search Disadvantage: Requires a sequential storage Not appropriate for linked lists (Why?) It is possible to use a linked structure which can be searched in a binary-like manner Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Trees Tree terminology Root node Children of the parent (3) Siblings to each other Leaf nodes Root, children, parent, ancestor, descendant, etc Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Binary Trees Each node has at most two children Useful in modeling processes where a comparison or experiment has exactly two possible outcomes the test is performed repeatedly Example multiple coin tosses encoding/decoding messages in dots and dashes such as Morse code Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Binary Trees Each node has at most two children Useful in modeling processes where a comparison or experiment has exactly two possible outcomes the test is performed repeatedly Example multiple coin tosses encoding/decoding messages in dots and dashes such as Morse code Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Array Representation of Binary Trees Works OK for complete trees, not for sparse trees Balanced trees are binary trees with the property that for each node, the height of its left subtree and the height of its right subtree differ by at most one, where a tree’s height is the number of levels in the tree. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Linked Representation of Binary Trees Uses space more efficiently Provides additional flexibility Each node has two links one to the left child of the node one to the right child of the node if no child node exists for a node, the link is set to NULL Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Binary Trees as Recursive Data Structures A binary tree is either empty … or Consists of a node called the root root has pointers to two disjoint binary (sub)trees called … right (sub)tree left (sub)tree Anchor Inductive step Which is either empty … or … Which is either empty … or … Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

ADT Binary Search Tree (BST) Collection of Data Elements binary tree each node x, value in left child of x value in x in right child of x Basic operations Construct an empty BST Determine if BST is empty Search BST for given item Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

ADT Binary Search Tree (BST) Basic operations (ctd) Insert a new item in the BST Maintain the BST property Delete an item from the BST Traverse the BST Visit each node exactly once The inorder traversal must visit the values in the nodes in ascending order View BST class template, Fig. 12-1 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

BST Traversals Note that recursive calls must be made To left subtree To right subtree Must use two functions Public method to send message to BST object Private auxiliary method that can access BinNodes and pointers within these nodes Similar solution to graphic output Public graphic method Private graphAux method Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

BST Searches Search begins at root If that is desired item, done If item is less, move down left subtree If item searched for is greater, move down right subtree If item is not found, we will run into an empty subtree View search() Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Inserting into a BST Insert function View insert() function Uses modified version of search to locate insertion location or already existing item Pointer parent trails search pointer locptr, keeps track of parent node Thus new node can be attached to BST in proper place View insert() function R Looking to insert ‘R’ Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Recursive Deletion Three possible cases to delete a node, x, from a BST 1. The node, x, is a leaf Make parent pointer a null pointer, In deleting D, make right pointer in parent C a null. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Recursive Deletion 2. The node, x has one child In deleting E, set right pointer of parent A to point at C Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Recursive Deletion x has two children Delete node pointed to by xSucc as described for cases 1 and 2 Replace contents of x with inorder successor Want to delete J. To replace with inorder successor, go right and then descent left as far as possible: K, then delete xSucc(original K) K Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Problem of Lopsidedness Trees can be totally lopsided Suppose each node has a right child only Degenerates into a linked list Processing time affected by "shape" of tree Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Hash Tables Recall order of magnitude of searches Linear search O(n) Binary search O(log2n) Balanced binary tree search O(log2n) Unbalanced binary tree can degrade to O(n) Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Hash Tables In some situations faster search is needed Solution is to use a hash function Value of key field given to hash function Location in a hash table is calculated Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Hash Functions Simple function could be to mod the value of the key by the size of the table H(x) = x % tableSize Note that we have traded speed for wasted space Table must be considerably larger than number of items anticipated Suggested to be 1.5-2x larger Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Hash Functions Observe the problem with same value returned by h(x) for different values of x Called collisions A simple solution is linear probing Empty slots marked with -1 Linear search begins at collision location Continues until empty slot found for insertion Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Hash Functions When retrieving a value linear probe until found If empty slot encountered then value is not in table If deletions permitted Slot can be marked so it will not be empty and cause an invalid linear probe Ex. -1 for unused slots, -2 for slots which used to contain data Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Collision Reduction Strategies Strategies for improved performance Increase table capacity (less collisions) Use different collision resolution technique Devise different hash function Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Collision Reduction Strategies Hash table capacity Size of table must be 1.5 to 2 times the size of the number of items to be stored Otherwise probability of collisions is too high Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Collision Reduction Strategies Linear probing can result in primary clustering Consider quadratic probing Probe sequence from location i is i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, … Secondary clusters can still form Double hashing Use a second hash function to determine probe sequence Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Collision Reduction Strategies Chaining Table is a list or vector of head nodes to linked lists When item hashes to location, it is added to that linked list Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

Improving the Hash Function Ideal hash function Simple to evaluate Scatters items uniformly throughout table Modulo arithmetic not so good for strings Possible to manipulate numeric (ASCII) value of first and last characters of a name Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3