Lecture 12COMPSCI.220.FS.T - 20041 Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,

Slides:



Advertisements
Similar presentations
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Advertisements

Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
School of Computer Science and Engineering
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Advanced Associative Structures
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K, and –a value (information), V Each key uniquely identifies its entry Table searching: –Given: a search key, K –Find: the table entry, (K,V)

Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing Once the entry (K,V) is found, its value V, may be updated, it may be retrieved, or the entire entry, (K,V), may be removed from the table If no entry with key K exists in the table, a new table entry having K as its key may be inserted in the table Hashing is a technique of storing values in the tables and searching for them in linear, O(n), worst-case and extremely fast, O(1), average-case time

Lecture 12COMPSCI.220.FS.T Basic Features of Hashing Hashing computes an integer, called the hash code, for each object The computation, called the hash function, h(K), maps objects (e.g., keys K ) to the array indices (e.g., 0, 1, …, i max ) An object having a key value K should be stored at location h(K ), and the hash function must always return a valid index for the array

Lecture 12COMPSCI.220.FS.T Basic Features of Hashing A perfect hash function produces a different index value for every key. But such a function cannot be always found. Collision: if two distinct keys, K 1  K 2, map to the same table address, h(K 1 ) = h(K 2 ) Collision resolution policy: how to find additional storage in which to store one of the collided table entries

Lecture 12COMPSCI.220.FS.T How Common Are Collisions? Von Mises Birthday Paradox: if there are more than 23 people in a room, the chance is greater than 50% that two or more of them will have the same birthday Thus, in the table that is only 6.3% full (since 23  365 = ) there is better than 50  50 chance of a collision!

Lecture 12COMPSCI.220.FS.T How Common Are Collisions? Probability Q N (n) of no collision (that is, that none of the n items collides, being randomly tossed into a table with N slots):

Lecture 12COMPSCI.220.FS.T Probability P N (n) of One or More Collisions n%P 365 (n)

Lecture 12COMPSCI.220.FS.T Open Addressing with Linear Probing (OALP) The simplest collision resolution policy: –to successively search for the first empty entry at a lower location –if no such entry, then ``wrap around'' the table Drawbacks: clustering of keys in the table

Lecture 12COMPSCI.220.FS.T OALP example: n  5..7 N  10

Lecture 12COMPSCI.220.FS.T Open Addressing with Double Hashing (OADH) Better collision resolution policy reducing the likelihood of clustering: –to hash the collided key again using a different hash function and –to use the result of the second hashing as an increment for probing table locations (including wraparound)

Lecture 12COMPSCI.220.FS.T OADH example: n = 5..7 N = 10

Lecture 12COMPSCI.220.FS.T Two More Collision Resolution Techniques Open addressing has a problem when significant number of items need to be deleted as logically deleted items must remain in the table until the table can be reorganised Two techniques to attenuate this drawback: –Chaining –Hash bucket

Lecture 12COMPSCI.220.FS.T Chaining and Hash Bucket Chaining: all keys collided at a single hash address are placed on a linked list, or chain, started at that address Hash bucket: a big hash table is divided into a number of small sub-tables, or buckets –the hush function maps a key into one of the buckets –the keys are stored in each bucket sequentially in increasing order

Lecture 12COMPSCI.220.FS.T Universal Classes of Hash Functions Universal hashing: a random choice of the hash function from a large class of hash functions in order to avoid bad performance on certain sets of input Let K, N, and H be a key set, a size of the range of the hash function, and a class of functions that map K to 0,…, N  1, respectively. Then H is universal if, for any distinct k,   K, it holds that H is a universal class if no pair of distinct keys collide under more than 1 / N of the functions in the class

Lecture 12COMPSCI.220.FS.T Choosing a hash function Four basic methods: division, folding, middle-squaring, and truncation Division: –choose a prime number as the table size N –convert keys, K, into integers –use the remainder h(K) = K mod N as a hash value of the key K –find a double hashing decrement using the quotient,  K = max{1, (K / N)mod N}

Lecture 12COMPSCI.220.FS.T Choosing a hash function Folding: –divide the integer key, K, into sections –add, subtract, and/or multiply them together for combining into the final value, h(K) Example: K   sections 013, 402, 122  h(K)  = 537

Lecture 12COMPSCI.220.FS.T Choosing a hash function Middle-squaring: –choose a middle section of the integer key, K –square the chosen section –use a middle section of the result as h(K) Example: K =  middle: 402  =  middle: h(K) = 6140

Lecture 12COMPSCI.220.FS.T Choosing a hash function Truncation: –delete part of the key, K –use the remaining digits (bits, characters) as h(K) Example: K=  last 3 digits: h(K) = 122 Notice that truncation does not spread keys uniformly into the table; thus it is often used in conjunction with other methods

Lecture 12COMPSCI.220.FS.T Universal Class by Division Theorem (universal class of hash functions by division): –Let the size of a key set, K, be a prime number: | K |  M –Let the members of K be regarded as the integers {0,…,M  1} –For any numbers a  {1,…,M-1}; b  {0,…,M  1} let

Lecture 12COMPSCI.220.FS.T Universal Class by Division Then H = {h a,b : 1  a < M and 0  b < M} is a universal class Proof: [optional: see in the Coursebook…] In practice: –let M be the next prime number larger than the size of the key set –Then choose randomly a and b such that a  0 and use the hash function h a,b (k)

Lecture 12COMPSCI.220.FS.T Efficiency of Search in Hash Tables Load factor : if a table of size N has exactly M occupied entries, then Average numbers of probe addresses examined for a successful ( S ) and unsuccessful ( U ) search: OALP:  0.7 OADH:  0.7 SC S  0.5(1  1 / (1  ))(1/  ln(1 / (1  1  /2 U  0.5(1  (1 / (1  )) 2 )1 / (1  SC  separate chaining; may be higher than 1

Lecture 12COMPSCI.220.FS.T Efficiency of Search: S (N = 997) SC; 3 trials OALP; 50 trials OADH; 50 trials / / / / / / / / / / / / / / / / / / 4.79 Theoretical / average measured experimental results

Lecture 12COMPSCI.220.FS.T Efficiency of Search: U (N = 997) SC; 3 trials OALP; 50 trials OADH; 50 trials / / / / / / / / / / / / / / / / / / 98.5 Theoretical / average measured experimental results

Lecture 12COMPSCI.220.FS.T Table ADT Representations: Comparative Performance OperationRepresentation Sorted arrayAVL treeHash table Initialize O(N)O(1)O(N) Is full? O(1) Search * ) O(log N) O(1) Insert O(N)O(log N)O(1) Delete O(N)O(log N)O(1) Enumerate O(N) O(N log N) ** ) * ) also: Retrieve, Update ** ) To enumerate a hash table, entries must first be sorted in ascending order of keys that takes O(N log N) time