Dictionaries and Hashing CSCI 3333 Data Structures.

Slides:



Advertisements
Similar presentations
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Advertisements

CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Maps. Hash Tables. Dictionaries. 2 CPSC 3200 University of Tennessee at Chattanooga – Summer 2013 © 2010 Goodrich, Tamassia.
Maps, Dictionaries, Hashtables
Dictionaries and Hash Tables1  
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Maps & Hashing Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
hashing1 Hashing It’s not just for breakfast anymore!
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
hashing1 Hashing It’s not just for breakfast anymore!
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Hash Functions and the HashMap Class A Brief Overview On Green Marble John W. Benning.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Dictionaries and Hash Tables. Dictionary A dictionary, in computer science, implies a container that stores key-element pairs called items, and allows.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Storage and Retrieval Structures by Ron Peterson.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Searching and Binary Search Trees CSCI 3333 Data Structures.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing is a method to store data in an array so that sorting, searching, inserting and deleting data is fast. For this every record needs unique key.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing by Dr. Bun Yue Professor of Computer Science CSCI 3333 Data Structures.
Map ADT by Dr. Bun Yue Professor of Computer Science CSCI 3333 Data Structures.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CSCI 210 Data Structures and Algorithms
Slides by Steve Armstrong LeTourneau University Longview, TX
Review Graph Directed Graph Undirected Graph Sub-Graph
Dictionaries 9/14/ :35 AM Hash Tables   4
Advanced Associative Structures
Dictionaries and Their Implementations
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Dictionaries and Hashing CSCI 3333 Data Structures

Acknowledgement  Dr. Yue  Ms. Krishani Abeysekera  Mr. Charles Moen  Dr. Wei Ding  Dr. Michael Goodrich

Fast Searching  Balanced BST, Binary Search, etc: O(lg n) average time in searching.  Can it be faster?  Can we have O(1) searching time?

Dictionary A dictionary, in computer science, implies a container that stores key- element pairs called items, and allows for quick retrieval. Items must be stored in a way that allows them to be located with the key Not necessary to store the items in order  Unordered dictionary  Ordered dictionary

Dictionary ADT Operations in a Dictionary ADT: intsize() boolisEmpty() iterelements() iterkeys() posfind( key ) iterfindAll( key ) voidinsertItem( key, elem ) voidremoveElement( key ) voidremoveAllElements( key )

Hashing A procedure to convert large data into a small indexed data. A hash table, or a hash map, is a data structure that associates keys with values  Compiler Example: Parsing: programs -> tokens Token handling  Constructing a symbol table.  Checking whether a token is a reserve word.  Other steps… Other steps…

Minimal Perfect Hashing Function A procedure to map distinct data with distinct index, with no collision.  For a token t, the hash address is computed by  Addr(t) = t.length + val(t[0]) + val(t[t.length-1]) – 2.  The val array where the index is ‘a’ to ‘z’ is given by: A= 11, B= 15, C= 1, D=0, E=0, F= 15, G =3, H= 15,1= 13, J=0, K=0, L= 15, M= 15, N= 13, O =0, P= 15, Q=0, R= 14, S=6, T=6, U= 14, V= 10, W=6, X=0, Y= 13, Z= 0. E.g. val[’A’] = 11. val[‘Y’] = 13.

Reserved words  Pascal reserved words can thus be stored using these addresses: reserved[0] = “DO” reserved[1] = “END” … reserved[35] = “PROGRAM”

Lessons Learnt  In hashing, the address is computed. In binary searching and BST, address is navigated to through comparisons.  It is desirable that each record is hashed in different address (perfect hashing).  It is desirable if all addresses are filled (minimal hashing).  In compilers (and many other applications), minimal perfect hashing is highly desirable.

Example  We design a hash table for a map storing entries as (SSN, Name), where SSN (social security number) is a nine-digit positive integer  Our hash table uses an array of size N  10,000 and the hash function h(x)  last four digits of x     …

Question  Is it better or worse to use the first four digits instead of the last four digits of the social security number?

Hashing Issues  Two main problems in hashing: Selecting good hashing functions. Handling collision (when more than one records are hashed into the same address).

Example  Consider a hash function of people names:  hash(name) = val(first char of last name) * val(middle initial) where val(char) = ascii(char) – ascii(‘A’) + 1.  Is this good?

Bucket Arrays A Bucket array for a hash table, is an array A of size N, where each cell of A is thought of as a ‘bucket’, and N defines the capacity of the array. Example Small company with less than 100 employees Each employee has an ID number in the range 0–99 Store employee records in an array, so that the employee ID number matches the array index EMPTY 01 Turing, A. 02 Babbage, C. EMPTY 04 Gates, W A …

Bucket Arrays If the keys are unique, then searches, insertions and removals in the bucket array take worst-case time of O(1). However, bucket arrays have 2 drawbacks. It requires a capacity of N (which is the maximum number of elements possible The key has to be a integer in the range [0, N-1]

Hash Functions and Hash Values A hash function is a way of creating a unique data. A function modifies or chops or mixes the data to create a fingerprint data called a hash value. To do this, the index of the hash table's array is generally calculated in two steps: A generic hash value could be calculated in many ways to map the key to an integer, called hash code. This value is reduced to a valid array index, often called compression map.

Hash code examples Integer Cast int hashCode( char key ){ return (int(key) ); } Summing int hashCode( long key ){ typedef unsigned long ulong; return ( int( ulong(key) >> 32 ) + int( key );} Polynomial s = 115 * 10 3 = t = 116 * 10 2 = o = 111 * 10 1 = 1110 p = 112 * 10 0 = Hashcode =

Hash Function Properties  A good hash function should: be easy and quick to compute. achieve an even distribution of the key values that actually occur across the index range supported by the table -> avoid collision.  Desirable properties: Perfect: no collision. No two keys are mapped to the same address. Minimal: the entire address range is used up entirely. Ordered: to facilitate traversal.