CHAPTER 7 HASHING What is hashing for? For searching But we already have binary search in O( ln n ) time after sorting. And we already have algorithms.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

CS Data Structures Chapter 8 Hashing.
§4 Open Addressing 2. Quadratic Probing f ( i ) = i 2 ; /* a quadratic function */ 【 Theorem 】 If quadratic probing is used, and the table size is prime,
Hashing General idea Hash function Separate Chaining Open Addressing
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing CS 3358 Data Structures.
Maps, Dictionaries, Hashtables
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashing Static Hashing Dynamic Hashing. – 2 – Sungkyunkwan University, Hyoung-Kee Choi © Symbol table ADT  We define the symbol table as a set of name-attribute.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Hashing & Hash Tables. Sets/Dictionaries Set - Our best efforts to date:
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CE 221 Data Structures and Algorithms
Hash table CSC317 We have elements with key and satellite data
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
Hashing Alexandra Stefan.
EEE2108: Programming for Engineers Chapter 8. Hashing
Hashing Alexandra Stefan.
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

CHAPTER 7 HASHING What is hashing for? For searching But we already have binary search in O( ln n ) time after sorting. And we already have algorithms for sorting in optimal time O( n ln n )... Wait a minute — who said that O( n ln n ) is the optimal time? There’s a theorem saying that “Any algorithm that sorts by comparisons only must have a worst case computing time of  ( n log 2 n ).” Then we just do something else besides comparison. Interpolation Search : Find key from a sorted list f [ l ].key, f [ l+1 ].key,..., f [ u ].key. f[l].key f[u].key lu key i ? n elements If ( f [ i ].key < key ) l = i ; Else u = i ; Search by Formula 1/14

§1 General Idea Symbol Table ( == Dictionary) ::= { } 〖 Example 〗 In Oxford English dictionary name = since attribute = a list of meanings M[0] = after a date, event, etc. M[1] = seeing that (expressing reason) …… …… This is the worst disaster in California since I was elected. California Governor Pat Brown, discussing a local flood 〖 Example 〗 In a symbol table for a compiler name = identifier (e.g. int) attribute = a list of lines that use the identifier, and some other fields 2/14

Objects: A set of name-attribute pairs, where the names are unique Operations:  SymTab Create(TableSize)  Boolean IsIn(symtab, name)  Attribute Find(symtab, name)  SymTab Insert(symtab, name, attr)  SymTab Delete(symtab, name)  Symbol Table ADT: §1 General Idea 3/14

Hash Tables [0][1] … [s  1] … ht [ 0 ] … ht [ 1 ] … ht [b  1] … … … b buckets s slots For each identifier x, we define a hash function f ( x ) = position of x in ht[ ] (i.e. the index of the bucket that contains x )  T ::= total number of distinct possible values for x  n ::= total number of identifiers in ht[ ]  identifier density ::= n / T  loading density ::= n / (s b) §1 General Idea 4/14

 A collision occurs when we hash two nonidentical identifiers into the same bucket, i.e. f ( i 1 ) = f ( i 2 ) when i 1  i 2.  An overflow occurs when we hash a new identifier into a full bucket. Collision and overflow happen simultaneously if s = 1. 〖 Example 〗 Mapping n = 10 C library functions into a hash table ht[ ] with b = 26 buckets and s = 2. Slot 1Slot …… 25 Loading density = ? 10 / 52 = 0.19 To map the letters a ~ z to 0 ~ 25, we may define f ( x ) = ? x [ 0 ]  ‘a’ acos define float exp char atan ceil floor clockctime Without overflow, T search = T insert = T delete = O( 1 ) T search = T insert = T delete = O( 1 ) §1 General Idea 5/14

Properties of f :  f ( x ) must be easy to compute and minimizes the number of collisions.  f ( x ) should be unbiased. That is, for any x and any i, we have that Probability( f ( x ) = i ) = 1 / b. Such kind of a hash function is called a uniform hash function. §2 Hash Function f ( x ) = x % TableSize ; /* if x is an integer */  What if TableSize = 10 and x’s are all end in zero? TableSize = prime number ---- good for random integer keys 6/14

§2 Hash Function f ( x ) = (  x[i]) % TableSize ; /* if x is a string */ 〖 Example 〗 TableSize = 10,007 and string length of x  8. If x[i]  [0, 127], then f ( x )  [0, ? ] 1016  f ( x ) = (x[0]+ x[1]*27 + x[2]*27 2 ) % TableSize ; Total number of combinations = 26 3 = 17,576  Actual number of combinations < /14

§2 Hash Function f ( x ) = (  x[N  i  1] * 32 i ) % TableSize ; ……… … x[0]x[1]x[N-2]x[N-1] Index Hash3( const char *x, int TableSize ) { unsigned int HashVal = 0; /* 1*/ while( *x != '\0' ) /* 2*/ HashVal = ( HashVal << 5 ) + *x++; /* 3*/ return HashVal % TableSize; } Faster than *27 Can you make it faster?  If x is too long (e.g. street address), the early characters will be left-shifted out of place.  Carefully select some characters from x. 8/14

§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef struct HashTbl *HashTable; struct ListNode { ElementType Element; Position Next; }; typedef Position List; /* List *TheList will be an array of lists, allocated later */ /* The lists use headers (for simplicity), */ /* though this wastes space */ struct HashTbl { int TableSize; List *TheLists; }; 9/14

HashTable InitializeTable( int TableSize ) { HashTable H; int i; if ( TableSize < MinTableSize ) { Error( "Table size too small" ); return NULL; } H = malloc( sizeof( struct HashTbl ) ); /* Allocate table */ if ( H == NULL ) FatalError( "Out of space!!!" ); H->TableSize = NextPrime( TableSize ); /* Better be prime */ H->TheLists = malloc( sizeof( List ) * H->TableSize ); /*Array of lists*/ if ( H->TheLists == NULL ) FatalError( "Out of space!!!" ); for( i = 0; i TableSize; i++ ) { /* Allocate list headers */ H->TheLists[ i ] = malloc( sizeof( struct ListNode ) ); /* Slow! */ if ( H->TheLists[ i ] == NULL ) FatalError( "Out of space!!!" ); else H->TheLists[ i ]->Next = NULL; } return H; } §3 Separate Chaining  Create an empty table H TheLists TableSize …… 10/14

§3 Separate Chaining  Find a key from a hash table Position Find ( ElementType Key, HashTable H ) { Position P; List L; L = H->TheLists[ Hash( Key, H->TableSize ) ]; P = L->Next; while( P != NULL && P->Element != Key ) /* Probably need strcmp */ P = P->Next; return P; } Your hash function Identical to the code to perform a Find for general lists -- List ADT 11/14

§3 Separate Chaining  Insert a key into a hash table void Insert ( ElementType Key, HashTable H ) { Position Pos, NewCell; List L; Pos = Find( Key, H ); if ( Pos == NULL ) { /* Key is not found, then insert */ NewCell = malloc( sizeof( struct ListNode ) ); if ( NewCell == NULL ) FatalError( "Out of space!!!" ); else { L = H->TheLists[ Hash( Key, H->TableSize ) ]; NewCell->Next = L->Next; NewCell->Element = Key; /* Probably need strcpy! */ L->Next = NewCell; }  Again?!  Tip: Make the TableSize about as large as the number of keys expected (i.e. to make the loading density factor  1). 12/14

§4 Open Addressing ---- find another empty cell to solve collision (avoiding pointers) Algorithm: insert key into an array of hash table { index = hash(key); initialize i = the counter of probing; while ( collision at index ) { index = ( hash(key) + f(i) ) % TableSize; if ( table is full ) break; else i ++; } if ( table is full ) ERROR (“No space left”); else insert key at index; } Collision resolving function. f(0) = 0.  Tip: Generally < /14

§4 Open Addressing 1. Linear Probing f ( i ) = i ; /* a linear function */ 〖 Example 〗 Mapping n = 11 C library functions into a hash table ht[ ] with b = 26 buckets and s = 1. acos atoi char define exp ceil cos float atol floor ctime bucketx … 25 search time acos1atoi2char1define1exp1ceil4cos5float3atol9floor5ctime9 Loading density = 11 / 26 = 0.42 Average search time = 41 / 11 = 3.73 Although p is small, the worst case can be LARGE. Analysis of the linear probing show that the expected number of probes = 1.36 Cause primary clustering: any key that hashes into the cluster will add to the cluster after several attempts to resolve the collision. 14/14