Hash Tables in C James Goerke
Basic Overview of Hash Tables Combines use of arrays and lists Creates array of lists that chains together the items that share a hash value Creates an efficient structure for storing and retrieving dynamic data Often used in compilers to store variables, web browsers to track history, and in internet connections to cache recently used domain names/ip addresses Created based off use of pre-defined hash function
Hash Chains Average list length is n/(array size) (where n is the amount of items) Points to next item or is marked null if at end of list NULL name 1 name 2 value 1 value 2 name 3 value 3
Hash Type Element type same as lists typedef struct Nameval Nameval; char *name; int value; nameval *next; /* in chain */ }; Nameval *symtab[NHASH]; /* a symbol table */
Hash Function/Calculation Passes key through in order to generate a “hash value” Hash value would be evenly distributed through a modest-sized integer range Hash value then used to index a table where information is stored One of the most common hashing algorithms uses the key string and builds a hash value by adding each byte of the string to a multiple of the current hash
Calculation Example enum { MULTIPLIER = 31 }; /* hash: compute hash value of string */ Unsigned int hash(char *str) { unsigned int h; unsigned char *p; h = 0; for (p = (unsigned char *) str; *p != ‘\0’; p++) h = MULTIPLIER * h + *p; return h % NHASH; //returns (result)mod(size of array)
Hash Table Lookup/Insert /* lookup: find name in symtab, with optional */ Nameval = lookup(char *name, int create, int value) { int h = hash; Nameval *sym; h = hash(name); for (sym = symtab[h]; sym != NULL; sym = sym->next) if (strcmp(name, sym->name) == 0) return sym; if (create) { sym = (Nameval *) emalloc(sizeof(Nameval)); sym->name = name; /* assumed allocated elsewhere */ sym->value = value; sym->next = symtab[h]; symtab[h] = sym; }
Possible Limitations A poor hash function or table size that is too small can lead to lists that grow long If average list size is too large this can lead to O(n) behavior instead of O(1) behavior. If used properly though, the constant –time lookup, insertion and deletion properties are unmatched by other sorting techniques.
Questions?