1 the hash table. hash table A hash table consists of two major components …

Slides:



Advertisements
Similar presentations
© 2004 Goodrich, Tamassia Hash Tables
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
ICOM 4035 – Data Structures Lecture 12 – Hashtables & Map ADT Manuel Rodriguez Martinez Electrical and Computer Engineering University of Puerto Rico,
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Log Files. O(n) Data Structure Exercises 16.1.
Hashing CS 3358 Data Structures.
Maps, Dictionaries, Hashtables
Dictionaries and Hash Tables1  
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Chapter 21 Hashing: Implementing Dictionaries and Sets
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

1 the hash table

hash table

A hash table consists of two major components …

hash table … a bucket array

hash table … and a hash function

hash table Performance is expected to be O(1)

bucket array

hash table A bucket array is an array A of size N A[i] is a bucket, i.e. a collection of pairs N is the capacity of A is inserted in A[k] if keys are well distributed between 0.. N-1 if keys are unique integers in range 0.. N-1 then each bucket holds at most one entry. consequently O(1) for get, insert, delete downside: space is proportional to N if N is much larger than n (number of entries) we waste space downside: keys must be in range 0.. N this may not be the case (think matric number) bucket array

(1,D) (3,C) (7,Q) (6,C) Bucket array of size 11 for the entries (1,D), (3,C), (3,F), (6,C) and (7,Q) If hashed keys unique entries in range [0..11] then each bucket holds at most one entry. Otherwise we have a collision and need to deal with it. hash tablebucket array

11 hash table collision When two different entries map to the same bucket we have a collision bucket array

12 hash table collision When two different entries map to the same bucket we have a collision It’s good to avoid collisions bucket array

hash function

hash table hash function A hash function maps each key to an integer in the range [0,N-1] Given entry … h(k) is the index into the bucket array store entry in A[h(k)] h is a good hash function if h maps keys so as to minimise collisions h is easy to compute/program h is fast to compute h(k) has two actions 1.map k to a hash code 2.map hash code into range [0,N-1]

hash table hash function hash codes in java But care should be taken as this might not be “good”

a bit of maths … that you know (af2)

Let A and B be sets A function is a mapping from elements of A to elements of B and is a subset of AxB i.e. can be defined by a set of tuples! af2

A is the domain B is codomain f(x) = y y is image of x x is preimage of y There may be more than one preimage of y There is only one image of x otherwise not a function There may be an element in the codomain with no preimage Range of f is the set of all images of A the set of all results af2

Injection (aka one-to-one, 1-1) a b c d u w y z injection a d x y z not an injection If an injection then preimages are unique b c v x af2

Injection (aka one-to-one, 1-1) a b c d u w y z injection a d x y z not an injection If an injection then preimages are unique b c v x Ideally we want our hash function to be injective (no collisions) have a small codomain and range may need to compress range af2

back to ads2

hash code & hash function Just to clear this up (but lets not make too big a deal about it) …

hash code & hash function Just to clear this up (but lets not make too big a deal about it) … We assume hash code is an integer in the codomain Hash function brings hash codes into the range [0,N-1] We will examine just a few hash functions, acting on strings

hash code & hash functionPolynomial hash codes Assume we have a key s that is a character String Here is a really dumb hash code public int dumbHash(String s){ int code = 0; for (int i=0;i<s.length();i++) code = code + s.charAt(i); return code; } What would we get for dumbHash(“spot”) dumbHash(“pots”) dumbHash(“tops”) dumbHash(“post”)

hash code & hash functionPolynomial hash codes Take into consideration the “position” of elements of the key So, this doesn’t look any different from an every-day number It’s to the base a and the coefficients are the components of the key

hash code & hash functionPolynomial hash codes Good values for a appear to be 33, 37, 39, 41

hash code & hash functionPolynomial hash codes Small scale experiments on unix dictionary a = words/strings minimum hash value maximum hash value collision count 7 Yikes! Look at that range!!!!

hash code & hash functionCyclic shift hash codes Start moving bits around

hash code & hash functionCyclic shift hash codes

hash code & hash functionCyclic shift hash codes Thanks to Arash Partow

hash code & hash functionCyclic shift hash codes

hash code & hash functionCyclic shift hash codes

hash code & hash functionCyclic shift hash codes

hash code & hash functionCyclic shift hash codes

hash code & hash functionCyclic shift hash codes

hash code & hash functionCyclic shift hash codes

hash code & hash functionCyclic shift hash codes

hash code & hash functionCompression Functions So, you think you’ve found something that produces a good hash code … How do we compress its range to fit into our machine?

hash code & hash functionCompression Functions Assume we want to limit storage to buckets in range [0,N-1] The division method int i = (int)(hash(s) % N); S[i] = s; … ideally, but there may be collisions  NOTE: keep N prime

hash code & hash functionCompression Functions Assume we want to limit storage to buckets in range [0,N-1] The multiply add and divide (MAD) method N is prime a > 1 is scaling factor b ≥ 0 is a shift a % N ≠ 0

hash tables Collision handling schemes

hash tables Collision handling schemes Separate Chaining

hash tables Collision handling schemes Separate Chaining bucket[i] is a small map implemented as a list bucket[i] should be a short list It may be sorted It might be something other than a list

hash tables Collision handling schemes Let N be number of buckets and n the amount of data stored load factor is n/M Downside: requires auxiliary data structures (to resolve collisions) this may put additional burden on space Separate Chaining Upside: simple

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Jon,plumber) hash(Jon) = 3

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Jon,plumber) hash(Jon) = 3 Jon,plumber

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Fred,painter) hash(Fred) = 6 Jon,plumber

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Fred,painter) hash(Fred) = 6 Jon,plumber Fred,painter

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Joe,prof) hash(Joe) = 1 Jon,plumber Fred,painter

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Joe,prof) hash(Joe) = 1 Jon,plumber Fred,painter Joe,prof

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Ted,cat) hash(Ted) = 3 Jon,plumber Fred,painter Joe,prof

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Ted,cat) hash(Ted) = 3 Jon,plumber Fred,painter Joe,prof

hash tables Collision handling schemes Separate Chaining A simple view: an array where array elements are linked list locn 0 1 list put(Ted,cat) hash(Ted) = 3 Jon,plumber Fred,painter Joe,prof Ted,cat

hash tables Collision handling schemes Open Addressing

hash tables Linear Probing Open Addressing

hash tables Linear Probing Open Addressing i = hash(key); bucket[i] != null; collision! Try next bucket[(i+2) % N] Try next bucket[(i+N-1) % N] Try next bucket[(i+1) % N]

hash tables Linear Probing Open Addressing locnkeyvalue

hash tables Linear Probing Open Addressing locnkeyvalue put(Jon,plumber) hash(Jon) = 3

hash tables Linear Probing Open Addressing locnkeyvalue Jonplumber put(Jon,plumber) hash(Jon) = 3

hash tables Linear Probing Open Addressing locnkeyvalue Jonplumber put(Fred,painter) hash(Fred) = 6

hash tables Linear Probing Open Addressing locnkeyvalue Jonplumber 4 5 6Fredpainter 7 put(Fred,painter) hash(Fred) = 6

hash tables Linear Probing Open Addressing locnkeyvalue Jonplumber 4 5 6Fredpainter 7 put(Joe,prof) hash(Joe) = 1

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Joe,prof) hash(Joe) = 1

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7 put(Ted,cat) hash(Ted) = 3

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7 put(Jock,dancer) hash(Jock) = 7

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Jock,dancer) hash(Jock) = 7

hash tables Linear Probing Open Addressing locnkeyvalue 0 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Burt,poet) hash(Burt) = 0

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Burt,poet) hash(Burt) = 0

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2Bobfish 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer put(Bob,fish) hash(Bob) = 6

hash tables Linear Probing Open Addressing locnkeyvalue 0Burtpoet 1Joeprof 2Bobfish 3Jonplumber 4Tedcat 5 6Fredpainter 7Jockdancer

hash tables Linear Probing Open Addressing What happens with get(key)? 1.i = hash(key); 2.bucket[i] == key … found, return 3.bucket[i] == null … not found, return  4.bucket[i] != null and bucket[i] != key i = (i+1) % N goto 2 “Linear Probing” gets its name because accessing a bucket is viewed as a probe

hash tables Linear Probing Open Addressing What happens with remove(key)? 1.i = hash(key); 2.bucket[i] == key … found bucket[i] = “removed” return 3.bucket[i] == null … not found  return 4. bucket[i] != null and bucket[i] != key i = (i+1) % N goto 2 We have a special marker “removed”

hash tables Linear Probing Open Addressing What happens with put(key)? 1.Free location j = -1; 2.i = hash(key); 3.bucket[i] == key … found update bucket[i] return 4.bucket[i] == “removed” j = i; i = (i+1) % N goto 3 5.bucket[i] != null && bucket[i] != key i = (i+1) % N goto 3 6. bucket[i] == null // search stops if (j > -1) bucket[j] = if (j = -1) bucket[i] =

hash tables Linear Probing Open Addressing So? Advantages saves space as bucket[i] is only a bucket for a single entry that is, no additional data structures Disadvantages removals are complicated put is complicated if there are collisions entries might clump together search can then degenerate from O(1) down to O(N) We might use linear probing when memory is tight and we want FAST access

hash tables Quadratic Probing Open Addressing

hash tables Quadratic Probing Open Addressing Quadratic probing iteratively try …. bucket[(i + f(j)) % N] where i = hash(key) j = 0,1,2,… f(j) = j*j

hash tables Double Hashing Open Addressing

hash tables Double Hashing Open Addressing We have a secondary hash function (call it g) i = hash(key) and collision at bucket[i] Try bucket[(i + g(key)) % N] Where g(key) = q – (key % q) Where q is a prime number < N

hash tables So? Open Addressing

hash tablesSo? Open Addressing Open addressing saves space, but is complicated, and may be slower In experiments chaining is competitive or faster, depending on load factor If memory is not an issue: recommend use chaining with low load factor