Hash Tables: A basic O(1)verview Niteesh Prasad CS-265
Hash Tables in C Data structure used for mapping/ searching. Increased efficiency to $O(1)$- Constant Look up time. Disadvantage: Wastage of memory Used in database indexing and associate arrays. Commonly used applications like web-browsers use hash tables. Creates a table with operations such as Insert and Retrieve(look up).
Basic Idea Store records of data in the hash table, which is either an array (when main memory is used), or a file (if disk space is used). Each record has a key field and an associated data field. The record is stored in a location that is based on its key. The function that produces this location for each given key is called a hash function
More Background Each element of the array is a list that chains together items that share a hash value. Basic element type would look like this: typedef structNameval Nameval; struct Nameval { char *name; int value; Nameval *next; /* in chain */ }; Hash function is important in deciding the efficiency of the search.
It’s not a “perfect” world out there Finding the perfect hash function is difficult (a function that maps each input to a different has value) Consequences: Conflicts b/w keys (Hash Collision) Collision resolution :(1) Resolution by overflow (Creating a separate table) (2) Double Hashing (Two independent hash functions) (3) Rehashing (Rebuilding the entire table ) The Multiplication method is the most-widely used to find the indices. (1) Multiply the key by a constant A, 0 < A < 1 (2) Extract the fractional part of the product (3) Multiply this value by m
Hash Function Hash function /* hash: compute hash value for array of NPREF strings */ unsigned int hash(char *s[NPREF]) { unsigned int h; unsigned char *p; int i; h = 0; for (i = 0; i < NPREF; i++) // traversal through the array for (p = (unsigned char *) s[i]; *p != '\0'; p++) h = MULTIPLIER * h + *p; return h % NHASH; }
Insert and Look up /* lookup: find name in symtab, with optional create */ Nameval* lookup(char *name, int create, int value) { int h; Nameval *sym; h = hash(name); for (sym = symtab[h]; sym != NULL; sym = sym->next) if (strcmp(name, sym->name) == 0) return sym; // To avoid duplication if (create) { sym = (Nameval *) emalloc(sizeof(Nameval)); sym->name=name; /* assumed allocated elsewhere */ sym->value=value; sym->next=symtab[h]; symtab[h]=sym; } return sym;
Sources http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx http://www.cs.drexel.edu/~knowak/cs265_fall_2010/week_6.pdf Brian Kernighan and Rob Pike, The Practice of Programming, Addison Wesley, 1999