October 6, 20031 Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CIS 606 Spring 2010.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
HASHING CSC 172 SPRING 2002 LECTURE 22. Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Search  We’ve got all the students here at this university and we want to find information about one of the students.  How do we do it?  Linked List?
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
CSC 172 DATA STRUCTURES. SETS and HASHING  Unadvertised in-store special: SETS!  in JAVA, see Weiss 4.8  Simple Idea: Characteristic Vector  HASHING...The.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
M. Böhlen and R. Sebastiani 9/29/20161 Data Structures and Algorithms Roberto Sebastiani
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
CSC 172 DATA STRUCTURES.
Hash table CSC317 We have elements with key and satellite data
Hashing CSE 2011 Winter July 2018.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Algorithms and Data Structures Lecture VI
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Hashing.
EE 312 Software Design and Implementation I
Presentation transcript:

October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University

October 6, This Lecture Dictionary ADT Hashing the concept collision resolution chooing a hash function advanced collision resolution

October 6, Dictionary Dictionary ADT – a dynamic set with methods: Search(S, k) – an access operation that returns a pointer x to an element where x.key = k Insert(S, x) – a manipulation operation that adds the element pointed to by x to S Delete(S, x) – a manipulation operation that removes the element pointed to by x from S An element has a key part and a satellite data part

October 6, Dictionaries Dictionaries store elements so that they can be located quickly using keys A dictionary may hold bank accounts each account is an object that is identified by an account number each account stores a wealth of additional information an application wishing to operate on an account would have to provide the account number as a search key

October 6, Dictionaries (2) Supporting order (methods such as min, max, successor, predecessor ) is not required, thus it is enough that keys are comparable for equality

October 6, Dictionaries (3) Different data structures to realize dictionaries arrays, linked lists (inefficient) Hash tables Binary trees Red/Black trees AVL trees B-trees In Java: java.util.Map – interface defining Dictionary ADT

October 6, The Problem RT&T is a large phone company, and they want to provide caller ID capability: given a phone number, return the caller’s name phone numbers range from 0 to r = want to do this as efficiently as possible

October 6, The Problem A few suboptimal ways to design this dictionary direct addressing: an array indexed by key: takes O(1) time, O(r) space - huge amount of wasted space a linked list: takes O(r) time, O(r) space (null) Jens Jensen (null) Jens Jensen Ole Olsen

October 6, Another Solution We can do better, with a Hash table -- O(1) expected time, O(n+m) space, where m is table size Like an array, but come up with a function to map the large range into one which we can manage e.g., take the original key, modulo the (relatively small) size of the array, and use that as an index Insert ( , Jens Jensen) into a hashed array with, say, five slots mod 5 = 4 (null) Jens Jensen 01234

October 6, Another Solution (2) A lookup uses the same process: hash the query key, then check the array at that slot Insert ( , Leif Erikson) And insert ( , Knut Hermandsen). (null) Jens Jensen 01234

October 6, Collision Resolution How to deal with two keys which hash to the same spot in the array? Use chaining Set up an array of links (a table), indexed by the keys, to lists of items with the same key Most efficient (time-wise) collision resolution scheme

October 6, Analysis of Hashing An element with key k is stored in slot h(k) (instead of slot k without hashing) The hash function h maps the universe U of keys into the slots of hash table T[0...m-1] Assumption: Each key is equally likely to be hashed into any slot (bucket); simple uniform hashing Given hash table T with m slots holding n elements, the load factor is defined as  =n/m Assume time to compute h(k) is  (1)

October 6, Analysis of Hashing (2) To find an element using h, look up its position in table T search for the element in the linked list of the hashed slot Unsuccessful search element is not in the linked list uniform hashing yields an average list length  = n/m expected number of elements to be examined  search time O(1+  ) (this includes computing the hash value)

October 6, Analysis of Hashing (3) Successful search assume that a new element is inserted at the end of the linked list (we can have a tail pointer) upon insertion of the i-th element, the expected length of the list is (i-1)/m in case of a successful search, the expected number of elements examined is 1 more that the number of elements examined when the sought-for element was inserted!

October 6, Analysis of Hashing (4) The expected number of elements examined is thus Considering the time for computing the hash function, we obtain

October 6, Analysis of Hashing (5) Assuming the number of hash table slots is proportional to the number of elements in the table searching takes constant time on average insertion takes O(1) worst-case time deletion takes O(1) worst-case time when the lists are doubly-linked

October 6, Hash Functions Need to choose a good hash function quick to compute distributes keys uniformly throughout the table good hash functions are very rare – birthday paradox How to deal with hashing non-integer keys: find some way of turning the keys into integers in our example, remove the hyphen in to get ! for a string, add up the ASCII values of the characters of your string (e.g., java.lang.String.hashCode()) then use a standard hash function on the integers

October 6, HF: Division Method Use the remainder: h(k) = k mod m k is the key, m the size of the table Need to choose m m = b e (bad) if m is a power of 2, h(k) gives the e least significant bits of k all keys with the same ending go to the same place m prime (good) helps ensure uniform distribution primes not too close to exact powers of 2

October 6, HF: Division Method (2) Example 1 hash table for n = 2000 character strings we don’t mind examining 3 elements m = 701 a prime near 2000/3 but not near any power of 2 Further examples m = 13 h(3)? h(12)? h(13)? h(47)?

October 6, HF: Multiplication Method Use h(k) =  m (k A mod 1)  k is the key, m the size of the table, and A is a constant 0 < A < 1 The steps involved map 0...k max into 0...k max A take the fractional part (mod 1) map it into 0...m-1

October 6, HF: Multiplication Method(2) Choice of m and A value of m is not critical, typically use m = 2 p optimal choice of A depends on the characteristics of the data Knuth says use (conjugate of the golden ratio) – Fibonaci hashing

October 6, HF: Multiplication Method (3) Assume 7-bit binary keys, 0  k < 128 m = 8 = 2 3 p = 3 A = = 89/128 k = (107) Using binary representation Thus, h(k) = ?

October 6, More on Collisions A key is mapped to an already occupied table location what to do?!? Use a collision handling technique We’ve seen Chaining Can also use Open Addressing Probing Double Hashing

October 6, Open Addressing All elements are stored in the hash table (can fill up!), i.e., n  m Each table entry contains either an element or null When searching for an element, systematically probe table slots Modify hash function to take the probe number i as the second parameter Hash function, h, determines the sequence of slots examined for a given key

October 6, Uniform Hashing Probe sequence for a given key k given by The assumption of uniform hashing: Each key is equally likely to have any of the m! permutations as its probe sequence

October 6, Linear Probing If the current location is used, try the next table location Lookups walk along the table until the key or an empty slot is found Uses less memory than chaining one does not have to store all those links Slower than chaining one might have to walk along the table for a long time LinearProbingInsert(k) 01 if (table is full) error 02 probe = h(k) 03 while (table[probe] occupied) 04 probe = (probe+1) mod m 05 table[probe] = k

October 6, Linear Probing A real pain to delete from either mark the deleted slot or fill in the slot by shifting some elements down Example h(k) = k mod 13 insert keys:

October 6, Double Hashing Use two hash functions Distributes keys more uniformly than linear probing DoubleHashingInsert(k) 01 if (table is full) error 02 probe = h1(k) 03 offset = h2(k) 03 while (table[probe] occupied) 04 probe = (probe+offset) mod m 05 table[probe] = k

October 6, Double Hashing (2) h2(k) must be relative prime to m Why? Suppose h2(k) = v*a, and m = w*a, where a > 1 A way to ensure this m – prime, h2(k) – positive integer < m Example h1(k) = k mod 13 h2(k) = 8 - k mod 8 insert keys:

October 6, Expected Number of Probes Load factor  < 1 for probing Analysis of probing uses uniform hashing assumpion – any permutation is equally likely How far are linear probing and double hashing from uniform hashing? unsuccessfulsuccessful chaining probing

October 6, Expected Number of Probes (2)

October 6, The Hashing Class in Java java.util.HashSet, java.util.HashMap Old version of HashMap: java.util.Hashtable Modification methods HashSet: boolean add(Object o) boolean remove(Object o) HashMap: Object put(Object key, Object value) Object remove(Object key)

October 6, The Hashing Class in Java (2) Accessor methods: HashSet boolean add(Object o) HashMap boolean containsKey(Object key)

October 6, Next Week Binary Search Trees