Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Search(theKey)  Delete(theKey)  Insert(theKey, theElement)

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
CSCE 3400 Data Structures & Algorithm Analysis
Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
Skip List & Hashing CSE, POSTECH.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Maps, Dictionaries, Hashtables
Dictionaries and Hash Tables1  
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 9 Hash Tables (continued) Reminder Examples.
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Get(theKey)  Delete(theKey)  Insert(theKey, theElement)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Tirgul 8 Hash Tables (continued) Reminder Examples.
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashing. Concept of Hashing In CS, a hash table, or a hash map, is a data structure that associates keys (names) with values (attributes). Look-Up Table.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey)
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
Comp 335 File Structures Hashing.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashed Files Text Versus Binary Meghan Cavanagh. Hashed Files a file that is searched using one of the hashing methods User gives the key, the function.
Copyright © Curt Hill Hashing A quick lookup strategy.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  find(theKey)  erase(theKey)  insert(theKey, theElement)
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Hashing Alexandra Stefan.
Homework will be announced soon Midterm exam date announced
EEE2108: Programming for Engineers Chapter 8. Hashing
Hashing Alexandra Stefan.
Hash functions Open addressing
Dictionaries Collection of unordered pairs. Operations. (key, element)
Dictionaries Collection of pairs. Operations. (key, element)
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Data Structures & Algorithms Hash Tables
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries Collection of pairs. Operations. (key, element)
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Search(theKey)  Delete(theKey)  Insert(theKey, theElement)

Hash Tables Worst-case time for Search, Insert, and Delete is O(size). Expected time is O(1).

Ideal Hashing Uses a 1D array (or table) table[0:b-1].  Each position of this array is a bucket.  A bucket can normally hold only one dictionary pair. Uses a hash function f that converts each key k into an index in the range [0, b-1].  f(k) is the home bucket for key k. Every dictionary pair (key, element) is stored in its home bucket table[f[key]].

[0][1][2][3][4][5][6][7] Ideal Hashing Example Pairs are: (22,a), (33,c), (3,d), (73,e), (85,f). Hash table is table[0:7], b = 8. Hash function is key/11. Pairs are stored in table as below: (85,f)(22,a)(33,c)(3,d)(73,e) Search, Insert, and Delete take O(1) time.

What Can Go Wrong? Where does (26,g) go? Keys that have the same home bucket are synonyms.  22 and 26 are synonyms with respect to the hash function that is in use. The home bucket for (26,g) is already occupied. [0][1][2][3][4][5][6][7] (85,f)(22,a)(33,c)(3,d)(73,e)

What Can Go Wrong? A collision occurs when the home bucket for a new pair is occupied by a pair with a different key. An overflow occurs when there is no space in the home bucket for the new pair. When a bucket can hold only one pair, collisions and overflows occur together. Need a method to handle overflows. (85,f)(22,a)(33,c)(3,d)(73,e)

Hash Table Issues Choice of hash function. Overflow handling method. Size (number of buckets) of hash table.

Hash Functions Two parts:  Convert key into a nonnegative integer in case the key is not an integer. Done by the function hash(). Map an integer into a home bucket.  f(k) is an integer in the range [0, b-1], where b is the number of buckets in the table.

String To Integer Each character is 1 byte long. An int is 4 bytes. A 2 character string key[0:1] may be converted into a unique 4 byte non-negative int using the code: number = key[0]; number += ((int) key[1]) << 8; Strings that are longer than 4 characters do not have a unique non-negative int representation.

String To Integer unsigned int stringToInt(char *key) { int number = 0; while(*key) { number += *key++; if (*key) number += ((int) *key++) <<8; } return number; }

Map Into A Home Bucket Most common method is by division. homeBucket = hash(theKey) % divisor; divisor equals number of buckets b. 0 <= homeBucket < divisor = b [0][1][2][3][4][5][6][7] (85,f)(22,a)(33,c)(3,d)(73,e)

Uniform Hash Function Let keySpace be the set of all possible keys. A uniform hash function maps the keys in keySpace into buckets such that approximately the same number of keys get mapped into each bucket. [0][1][2][3][4][5][6][7] (85,f)(22,a)(33,c)(3,d)(73,e)

Uniform Hash Function Equivalently, the probability that a randomly selected key has bucket i as its home bucket is 1/b, 0 <= i < b. A uniform hash function minimizes the likelihood of an overflow when keys are selected at random. [0][1][2][3][4][5][6][7] (85,f)(22,a)(33,c)(3,d)(73,e)

Hashing By Division keySpace = all ints. For every b, the number of ints that get mapped (hashed) into bucket i is approximately 2 32 /b. Therefore, the division method results in a uniform hash function when keySpace = all ints. In practice, keys tend to be correlated. So, the choice of the divisor b affects the distribution of home buckets.

Selecting The Divisor Because of this correlation, applications tend to have a bias towards keys that map into odd integers (or into even ones). When the divisor is an even number, odd integers hash into odd home buckets and even integers into even home buckets.  20%14 = 6, 30%14 = 2, 8%14 = 8  15%14 = 1, 3%14 = 3, 23%14 = 9 The bias in the keys results in a bias toward either the odd or even home buckets.

Selecting The Divisor When the divisor is an odd number, odd (even) integers may hash into any home.  20%15 = 5, 30%15 = 0, 8%15 = 8  15%15 = 0, 3%15 = 3, 23%15 = 8 The bias in the keys does not result in a bias toward either the odd or even home buckets. Better chance of uniformly distributed home buckets. So do not use an even divisor.

Selecting The Divisor Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, … The effect of each prime divisor p of b decreases as p gets larger. Ideally, choose b so that it is a prime number. Alternatively, choose b so that it has no prime factor smaller than 20.