M. Böhlen and R. Sebastiani 9/29/20161 Data Structures and Algorithms Roberto Sebastiani

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Hashing.
CSC 172 DATA STRUCTURES.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing CSE 2011 Winter July 2018.
Slides by Steve Armstrong LeTourneau University Longview, TX
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Advanced Associative Structures
Hash Table.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Data Structures and Algorithms
Dictionaries and Their Implementations
Introduction to Algorithms 6.046J/18.401J
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
Algorithms and Data Structures Lecture VI
Hashing Alexandra Stefan.
Advance Database System
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Hashing.
CS 3343: Analysis of Algorithms
EE 312 Software Design and Implementation I
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

M. Böhlen and R. Sebastiani 9/29/20161 Data Structures and Algorithms Roberto Sebastiani - Week 07 - B.S. In Applied Computer Science Free University of Bozen/Bolzano academic year

M. Böhlen and R. Sebastiani 9/29/20162 Acknowledgements & Copyright Notice These slides are built on top of slides developed by Michael Boehlen. Moreover, some material (text, figures, examples) displayed in these slides is courtesy of Kurt Ranalter. Some examples displayed in these slides are taken from [Cormen, Leiserson, Rivest and Stein, ``Introduction to Algorithms'', MIT Press], and their copyright is detained by the authors. All the other material is copyrighted by Roberto Sebastiani. Every commercial use of this material is strictly forbidden by the copyright laws without the authorization of the authors. No copy of these slides can be displayed in public or be publicly distributed without containing this copyright notice.Michael Boehlen

M. Böhlen and R. Sebastiani 9/29/20163 Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis

M. Böhlen and R. Sebastiani 9/29/20164 Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis

M. Böhlen and R. Sebastiani 9/29/20165 Dictionary ● Dictionary – a dynamic data structure with methods: – Search(S, k) – an access operation that returns a pointer x to an element where x.key = k – Insert(S, x) – a manipulation operation that adds the element pointed to by x to S – Delete(S, x) – a manipulation operation that removes the element pointed to by x from S ● An element has a key part and a satellite data part.

M. Böhlen and R. Sebastiani 9/29/20166 Dictionaries ● Dictionaries store elements so that they can be located quickly using keys. ● A dictionary may hold bank accounts. – Each account is an object that is identified by an account number. – Each account stores a lot of additional information. – An application wishing to operate on an account would have to provide the account number as a search key.

M. Böhlen and R. Sebastiani 9/29/20167 Dictionaries/2 ● If order (methods such as min, max, successor, predecessor) is not required it is enough to check for equality. ● Operations that require ordering are still possible but cannot use the dictionary access structure. – Usually all elements must be compared, which is slow. – Can be OK if it is rare enough

M. Böhlen and R. Sebastiani 9/29/20168 Dictionaries/3 ● Different data structures to realize dictionaries – arrays – linked lists – Hash tables – Binary trees – Red/Black trees – B-trees ● In Java: – java.util.Map – interface defining Dictionary ADT

M. Böhlen and R. Sebastiani 9/29/20169 Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis

M. Böhlen and R. Sebastiani 9/29/ The Problem ● AT&T is a large phone company, and they want to provide caller ID capability: – given a phone number, return the caller’s name – phone numbers range from 0 to r = – want to do this as efficiently as possible

M. Böhlen and R. Sebastiani 9/29/ The Problem/2 ● Two suboptimal ways to design this dictionary – direct addressing: an array indexed by key: ● Requires O(1) time, ● Requires O(r) space - huge amount of wasted space – a linked list: requires O(n) time, O(n) space (null) Jens(null) Jens Ole

M. Böhlen and R. Sebastiani 9/29/ Another Solution: Hashing ● We can do better, with a Hash table of size m. ● Like an array, but with a function to map the large range into one which we can manage. ● e.g., take the original key, modulo the (relatively small) size of the table, and use that as an index ● Insert ( , Jens) into a hash table with, say, five slots (m = 5) ● mod 5 = 4 ● O(1) expected time, O(n+m) space Jens(null)

M. Böhlen and R. Sebastiani 9/29/ Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis

M. Böhlen and R. Sebastiani 9/29/ Hash Functions ● Need to choose a good hash function (HF) – quick to compute – distributes keys uniformly throughout the table ● How to deal with hashing non-integer keys: – find some way of turning the keys into integers ● in our example, remove the hyphen in to get ● for a string, add up the ASCII values of the characters of your string (e.g., java.lang.String.hashCode()) – then use a standard hash function on the integers.

M. Böhlen and R. Sebastiani 9/29/ HF: Division Method ● Use the remainder: h(k) = k mod m – k is the key, m the size of the table ● Need to choose m ● m = b e (bad) – if m is a power of 2, h(k) gives the e least significant bits of k – all keys with the same ending go to the same place ● m prime (good) – helps ensure uniform distribution – primes not too close to exact powers of 2 are best

M. Böhlen and R. Sebastiani 9/29/ HF: Division Method/2 ● Example 1 – hash table for n = 2000 character strings, ok to investigate an average of three attempts/search – m = 701 ● a prime near 2000/3 ● but not near any power of 2 ● Further examples – m = 13 ● h(3) = 3 ● h(12) = 12 ● h(13) = 0

M. Böhlen and R. Sebastiani 9/29/ HF: Multiplication Method ● Use h(k) =  m (k A mod 1)  – k is the key – m the size of the table – A is a constant 0 < A < 1 – (k A mod 1): the fractional part of k A ● The steps involved – map 0...k max into 0...k max A – take the fractional part (mod 1) – map it into 0...m-1

M. Böhlen and R. Sebastiani 9/29/ HF: Multiplication Method/2 ● Choice of m and A – Value of m is not critical (typically use for some p m = 2 p – Optimal choice of A depends on the characteristics of the data ● Knuth says use =

M. Böhlen and R. Sebastiani 9/29/ HF: Multiplication Method/3 ● Assume 7-bit binary keys, 0  k < 128 ● m = 64 = 2 6, p = 6 ● A = 89/128 = , k = 107 = ● Computation of h(k): A k kA kA mod m(kA mod 1) ● Thus, h(k) = 25

M. Böhlen and R. Sebastiani 9/29/ Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis

M. Böhlen and R. Sebastiani 9/29/ Collisions ● Assume a key is mapped to an already occupied table location – what to do? ● Use a collision handling technique ● 3 techniques to deal with collisions: – chaining – open addressing/linear probing – open addressing/double hashing

M. Böhlen and R. Sebastiani 9/29/ Chaining ● Chaining maintains a table of links, indexed by the keys, to lists of items with the same key

M. Böhlen and R. Sebastiani 9/29/ Open Addressing ● All elements are stored in the hash table (can fill up), i.e., n  m ● Each table entry contains either an element or null ● When searching for an element, systematically probe table slots ● Modify hash function to take probe number i as second parameter h: U x { 0, 1,..., m-1 } -> { 0, 1,..., m-1 }

M. Böhlen and R. Sebastiani 9/29/ Open Addressing/2 ● Hash function, h, determines the sequence of slots examined for a given key ● Probe sequence for a given key k given by ( h(k,0), h(k,1),..., h(k,m-1) ) a permutation of ( 0, 1,..., m-1 )

M. Böhlen and R. Sebastiani 9/29/ Linear Probing LinearProbingInsert(k) 01 if (table is full) error 02 probe = h(k) 03 while (table[probe] occupied) 04 probe = (probe+1) mod m 05 table[probe] = k ● If the current location is used, try the next table location: ● h(key,i) = (h1(key)+i) mod m ● Lookups walk along the table until the key or an empty slot is found ● Uses less memory than chaining – one does not have to store all those links ● Slower than chaining – one might have to probe the table for a long time

M. Böhlen and R. Sebastiani 9/29/ Linear Probing/2 ● Problem “primary clustering”: long lines of occupied slots – A slot preceded by i full slots has a high probability of getting filled: (i+1)/m ● Alternatives: (quadratic probing,) double hashing ● Example: – h(k) = k mod 13 – insert keys:

M. Böhlen and R. Sebastiani 9/29/ Double Hashing ● Use two hash functions: ● h(key,i) = (h1(key) + i*h2(key)) mod m, i=0,1,... DoubleHashingInsert(k) 01 if (table is full) error 02 probe = h1(k) 03 offset = h2(k) 03 while (table[probe] occupied) 04 probe = (probe+offset) mod m 05 table[probe] = k ● Distributes keys much more uniformly than linear probing.

M. Böhlen and R. Sebastiani 9/29/ Double Hashing/2 ● h2(k) must be relative prime to m to search the entire hash table. – Suppose h2(k) = k*a and m = w*a, a > 1 ● Two ways to ensure this: – m is power of 2, h2(k) is odd – m: prime, h2(k): positive integer < m ● Example – h1(k) = k mod 13, h2(k) = 8 - (k mod 8) – insert keys:

M. Böhlen and R. Sebastiani 9/29/ Open addressing: delete ● Complex to delete from – A slot may be reached from different points ● We cannot simply store “NIL”: we'd loose the information necessary to retrieve other keys – Possible solution: mark the deleted slot as “deleted”, insert also on “deleted” ● Drawback: retrieval time no more depending by load factor: potentially lots of “jumps” on “deleted” slots ● When deletion admitted/frequent, chaining preferred

M. Böhlen and R. Sebastiani 9/29/ Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing 3. Hash Functions 4. Collisions 5. Performance Analysis

M. Böhlen and R. Sebastiani 9/29/ Analysis of Hashing: ● An element with key k is stored in slot h(k) (instead of slot k without hashing) ● The hash function h maps the universe U of keys into the slots of hash table T[0...m-1] h: U → { 0, 1,..., m-1 } ● Assumption: Each key is equally likely to be hashed into any slot (bucket); simple uniform hashing ● Given hash table T with m slots holding n elements, the load factor is defined as  =n/m

M. Böhlen and R. Sebastiani 9/29/ Analysis of Hashing/2 ● Assume time to compute h(k) is  (1) ● To find an element – using h, look up its position in table T – search for the element in the linked list of the hashed slot – uniform hashing yields an average list length  = n/m – expected number of elements to be examined  – search time O(1+  )

M. Böhlen and R. Sebastiani 9/29/ Analysis of Hashing/3 ● Assuming the number of hash table slots is proportional to the number of elements in the table n = O(m) a = n/m = O(m)/m = O(1) – searching takes constant time on average – insertion takes O(1) worst-case time – deletion takes O(1) worst-case time (pass the element not key, lists are doubly-linked)

M. Böhlen and R. Sebastiani 9/29/ Expected Number of Probes ● Load factor  < 1 for probing ● Analysis of probing uses uniform hashing assumption – any permutation is equally likely ● Chaining: 1( a =0%), 1.5( a =50%), 2( a =100%), n( a =n) ● Probing, unsucc: 1.25( a =20%), 2( a =50%), 5( a =80%), 10( a =90%) ● Probing, succ: 0.28( a =20%), 1.39( a =50%), 2.01( a =80%), 2.56( a =90%)

M. Böhlen and R. Sebastiani 9/29/ Expected Number of Probes/2

M. Böhlen and R. Sebastiani 9/29/ Summary ● Hashing is very efficient (not obvious, probability theory). ● Its functionality is limited (printing elements sorted according to key is not supported). ● The size of the hash table may not be easy to determine. ● A hash table is not really a dynamic data structure.

M. Böhlen and R. Sebastiani 9/29/ Suggested exercises ● Implement a Hash Table with the different techniques ● With paper & pencil, draw the evolution of a hash table when inserting, deleting and searching for new element, with the different techniques ● See also exercises of CLRS

M. Böhlen and R. Sebastiani 9/29/ Next Week ● Graphs: – Representation in memory – Breadth-first search – Depth-first search – Topological sort