Hash Tables.

Slides:

Advertisements

Similar presentations

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.

Advertisements

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,

Hash Tables and Sets Lecture 3. Sets A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with.

CSCE 3400 Data Structures & Algorithm Analysis

Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.

Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.

(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.

1 Joe Meehean 1.  BST easy to implement average-case times O(LogN) worst-case times O(N)  AVL Trees harder to implement worst case times O(LogN)  Can.

HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)

1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.

DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.

Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.

Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.

Hashing as a Dictionary Implementation Chapter 19.

Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.

WEEK 1 Hashing CE222 Dr. Senem Kumova Metin

Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.

Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.

Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.

Copyright © Curt Hill Hashing A quick lookup strategy.

Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.

Hash Tables "hash collision n. [from the techspeak] (var. `hash clash') When used of people, signifies a confusion in associative memory or imagination,

1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.

CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.

Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.

Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.

Building Java Programs Generics, hashing reading: 18.1.

Sets and Maps Chapter 9.

Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.

Hashing CSE 2011 Winter July 2018.

Hashing Exercises.

Efficiency add remove find unsorted array O(1) O(n) sorted array

Hash functions Open addressing

LinkedList Class.

Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.

Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)

Hash Tables -The Hacker's Dictionary

Hashing CS2110 Spring 2018.

Design and Analysis of Algorithms

Advanced Associative Structures

CMSC 341 Hashing (Continued)

Chapter 28 Hashing.

Building Java Programs

Topic 22 Hash Tables -The Hacker's Dictionary

Hashing CS2110.

CSE 373: Data Structures and Algorithms

Chapter 21 Hashing: Implementing Dictionaries and Sets

Dictionaries and Their Implementations

Searching Tables Table: sequence of (key,information) pairs

slides created by Marty Stepp and Hélène Martin

CSCE 3110 Data Structures & Algorithm Analysis

CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.

CSE 373: Data Structures and Algorithms

slides adapted from Marty Stepp and Hélène Martin

CSE 373 Data Structures and Algorithms

CSE 373: Data Structures and Algorithms

CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.

CS202 - Fundamental Structures of Computer Science II

CSE 143 Lecture 25 Set ADT implementation; hashing read 11.2

Sets and Maps Chapter 9.

CSE 373 Separate chaining; hash codes; hash maps

Algorithms: Design and Analysis

slides created by Marty Stepp

Building Java Programs

Hashing based on slides by Marty Stepp

slides created by Marty Stepp and Hélène Martin

CSE 373 Set implementation; intro to hashing

CSE 373: Data Structures and Algorithms

Presentation transcript:

Hash Tables

Overview of Data Structures Arrays Access: O(1) Insertion: O(N) (average case) Deletion: O(N) (average case) Linked lists Access: O(N) Insertion: O(N) (average case – O(1) at front) Deletion: O(N) (average case – O(1) at front) Binary Search Trees Access: O(logN) if balanced Insertion: O(logN) if balanced Deletion: O(logn) if balanced

BST: Better Performance Divide and conquer – reduce work by a factor of 2 at each step Can we reduce by a bigger factor?

Set Representation To represent keys with no duplicates: Use an array Store value k at index k in array Could just store 0 (not in set) or 1 (in set) at index k (or true/false) Problems – memory use, often sparsely filled array; negative ints, ... Big-O analysis: Access: Insertion: Deletion: index 1 2 3 4 5 6 7 value

Hash Tables Hash tables overcome the problems of arrays but maintain fast access, insertion, deletion Use an array and hash functions to determine the index of each element hash: to mix randomly, to chop into pieces hash function: mapping that takes a large data value (not necessarily an integer) and reduces it to a smaller piece of data (usually an integer) maps values to indexes hash code: the output of a hash function for a particular input Ex: On last slide, the hash function was hashCode(i) = i hash table: array that stores elements via hashing Call the array entries "buckets" – in previous slide, we had 8 buckets

Another Hash Function Want to allow negative integers hashCode(i) = abs(i) % capacity, where capacity is number of buckets int hashCode(int i) { return abs(i) % capacity; } Suppose capacity = 8 mySet.add(3); mySet.add(-4); mySet.add(10); mySet.add(115); // collides with 3 index 1 2 3 4 5 6 7 value 10 3 115 -4

Collisions collision: a hash function maps two or more values to same index mySet.add(3); mySet.add(-4); mySet.add(10); mySet.add(115); // collides with 3 Must have way of resolving collisions index 1 2 3 4 5 6 7 value 10 3 115 -4

Hash Function Example Use names as the key Take the second letter of the name, take int value of letter (a=0, b=1, etc.), divide by 8 and take the remainder Ex: What does "Norbert" hash to? o  15 15 % 8 = 7 Mary  1 % 8 = 1 Huy  21 % 8 = 6 Brandon  18 % 8 = 2 Scott  3 % 8 = 3 Shyam  8 % 8 = 0 Bobby  15 % 8 = 7 Colin  15 % 8 = 7 (collision) Not a 1-1 mapping from keys to hash values What's the max number of values that this function can hash perfectly (without collisions)?

Hash Function Another example of a hash function: maps a string to the sum of the ascii values for its characters int hashCode(string s) { int sum = 0; for(int i = 0; i < s.length(); i++) sum += s[i]; return sum; } Question: what's the hash code for "BAD"?

Collision Handling How to insert an element when an element is already occupying the bucket it's mapped to

Collision Resolution: Probing probing: resolve a collision by moving to another index linear probing: move to next available index, wrapping around if necessary mySet.add(3); mySet.add(-4); mySet.add(10); mySet.add(115); // collides with 3 quadratic probing: a variation that adds terms of a polynomial to index when a collision occurs index + 1, index + 4, index + 9, ... Disadvantage: clusters – elements at consecutive indexes slows down lookup Resize when load factor reaches some limit load factor of table with n elements: n/capacity index 1 2 3 4 5 6 7 value 10 -4 115

Collision Resolution: Chaining Elements in hash table are another data structure linked list, balanced binary tree Uses more space Everything goes in the right bucket, can't run out of indexes index 1 2 3 4 5 6 value NULL 12 34 42 74 14

Chaining: how to add element Make sure you avoid duplicates Add to linked list of new element's bucket – fastest to add to front Insertion: O(1) index 1 2 3 4 5 6 value NULL 12 34 42 74 14

Implement HashSet Class Exercise: Represent a set of integers using a hash table Include the following member functions: HashSet() add(int value) clear() contains(int value) remove(int value)

HashSet.h #ifndef _HASHSET_H #define _HASHSET_H #include<string> struct HashNode { int data; HashNode* next; HashNode(int data = 0; HashNode* next = NULL) { thisdata = data; thisnext = next; } };

HashSet.h class HashSet { public: HashSet(); // construct new empty set ~HashSet(); // frees memory allocated by set void add(int value); // adds value to set if not already present void clear(); // removes all elements from set bool contains(int value) const; // returns true if value is in set void print() const; // prints hash table void remove(int value); // Removes value from set, if present private: HashNode** elements; // array of HashNode pointers int capacity; int size; int hashCode(int value) const; // returns integer index of value }; #endif

HashSet.cpp #include "HashSet.h" using namespace std; HashSet::HashSet() { elements = new HashNode* [10](); capacity = 10; size = 0; } HashSet::~HashSet() { // to do delete [] elements; int HashSet::hashCode(int value) const { return abs(value) % capacity;

add Method Add an element to the hash table Don't add duplicate elements Compute hash code, and add element to corresponding linked list (if it's not already there) index 1 2 3 4 5 6 value NULL 12 34 table.add(52); 42 74 52 14

Exercise: add Method Write the add Method for HashSet https://www.youtube.com/watch?v=PCIvOGveIK0

HashSet.cpp void HashSet::add(int value) { if(!contains(value)) { int bucket = hashCode(value); // add to linked list HashNode* newNode = new HashNode(value); newNode->next = elements[bucket]; elements[bucket] = newNode; size++; }

HashSet.cpp void HashSet::print() const { for(int i = 0; i < capacity; i++) { cout << "[" << i << "]: "; HashNode* cur = elements[i]; while(cur != NULL) { cout << " -> " << cur->data; cur = cur->next; } cout << endl; cout << "size = " << size << endl;

contains Method Is a value in the hash table? Returns true or false. Compute the value's hash code, and look through linked list at that index. table.contains(42); // true table.contains(54); // false index 1 2 3 4 5 6 value NULL 12 34 42 74 14

Exercise: contains Method Write the contains method for HashSet

contains Method bool HashSet::contains(int value) { int bucket = hashCode(value); HashNode* cur = elements[bucket]; while(cur != NULL) { if(cur->data == value) return true; cur = cur->next; } return false;

remove Method

remove Method // if value is in hash table, remove it. void HashSet::remove(int value) { if(!contains(value)) return; int bucket = hashCode(value); HashNode* temp = NULL; if(elements[bucket]data == value){ temp = elements[bucket]; elements[bucket] = tempnext; } // element to remove isn't first else{ HashNode* cur = elements[bucket]; HashNode* temp = NULL; while(curnext != NULL) { if(curnextdata == value){ temp = curnext; curnext = curnextnext; } cur = curnext; delete temp;

Rehash rehash: move to larger array when table too full Can't copy old array into larger array – hash codes will change load factor = (# of elements)/(# of buckets) Common to rehash when load factor is around 0.75 Reduces collisions – linked lists are shorter