Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming.

Similar presentations


Presentation on theme: "Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming."— Presentation transcript:

1 Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming Language, 2 nd edition, by Kernighan and Ritchie and from C: How to Program, 5 th and 6 th editions, by Deitel and Deitel)

2 Hash TablesCS-2301, B-Term 20092 New Challenge What if we require a data structure that has to be accessed by value in constant time? I.e., O(log n) is not good enough! Need to be able to add or delete items Total number of items unknown But an approximate maximum might be known

3 Hash TablesCS-2301, B-Term 20093 Examples Anti-virus scanner Symbol table of compiler Virtual memory tables in operating system Bank or credit card account for a person

4 Hash TablesCS-2301, B-Term 20094 Example – Validate a Credit Card 16-digit credit card numbers 10 16 possible card numbers Sparsely populated space E.g., 10 8 MasterCard holders, similar for Visa Not “random” enough for a binary tree Too many single branches  really deep searches Need to respond to customer in 1-2 seconds 1000s or tens of 1000s of customers per second! Same is true for ATM card numbers Bank account numbers Etc.

5 Hash TablesCS-2301, B-Term 20095 Example — Anti-Virus Scanner Look at each sequence of bytes in a file See if it matches against library of virus patterns How many possible patterns? If so, flag it as a possible problem Tens of Thousands!

6 Hash TablesCS-2301, B-Term 20096 Anti-Virus Scanner (continued) Time to scan a file? O(length)  O(# of patterns) Can we do better? Store patterns in a tree O(length)  O(log (# of patterns)) Can we do even better? Yes — a Hash Table. Today’s topic.

7 Hash TablesCS-2301, B-Term 20097 Requirement In these applications (and many like them), need constant time access I.e., O(1) Need to access by value!

8 Hash TablesCS-2301, B-Term 20098 Observation Arrays provide constant time access … … but you have to know which element you want! We only know the contents of the item we want! Also Not easy to grow or shrink Not open-ended Can we do better?

9 Hash TablesCS-2301, B-Term 20099 Definition – Hash Table A data structure comprising an array for constant time access A set of linked lists one list for each array element A hashing function to convert search key to array index a randomizing function to assure uniform distribution of values across array indices Also known as a hash function

10 Hash TablesCS-2301, B-Term 200910 Definition – Search Key A value stored as (part of) the payload of the item you are looking for E.g., your credit card number Your account number at Amazon A pattern characteristic of a virus Need to find the item containing that value (i.e., that key)

11 Hash TablesCS-2301, B-Term 200911 Definition – Hash Function A function that randomizes the search key it to produce an index into the array Always returns the same value for the same key So that non-random keys don’t concentrate around a subset of the indices in the array See §6.6 in Kernighan & Ritchie

12 Hash TablesCS-2301, B-Term 200912 data next Hash Table Structure item... data next data next data next data next data next data next data next data next data next data next data next data next The array The lists

13 Hash TablesCS-2301, B-Term 200913 data next Hash Table Structure (continued) item... data next data next data next data next data next data next data next data next data next data next data next data next The array Note that some of the lists are empty Average length of list should be in single digits

14 Hash TablesCS-2301, B-Term 200914 Guidelines for Hash Tables Lists from each item should be short I.e., with short search time (approximately constant) Size of array should be based on expected # of entries Err on large side if possible Hashing function Should “spread out” the values relatively uniformly Multiplication and division by prime numbers usually works well

15 Hash TablesCS-2301, B-Term 200915 Example Hashing Function P. 144 of K & R #define HASHSIZE 101 unsigned int hash(char *s) { unsigned int hashval; for (hashval = 0; *s != ‘\0’; s++) hashval = *s + 31 * hashval; return hashval % HASHSIZE } Note prime numbers to “mix it up”

16 Hash TablesCS-2301, B-Term 200916 Using a Hash Table struct item *lookup(char *s) { struct item *np; for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL;/* not found */ }

17 Hash TablesCS-2301, B-Term 200917 Using a Hash Table struct item *lookup(char *s) { struct item *np; for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL;/* not found */ } Hash table is indexed by hash value of s

18 Hash TablesCS-2301, B-Term 200918 Using a Hash Table struct item *lookup(char *s) { struct item *np; for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL;/* not found */ } Traverse the linked list to find item s

19 Hash TablesCS-2301, B-Term 200919 Using a Hash Table (continued) struct item *addItem(char *s, …) { struct item *np; unsigned int hv; if ((np = lookup(s)) == NULL) { np = malloc(item); /* fill in s and data */ np -> next = hashtab[hv = hash(s)]; hashtab[hv] = np; }; return np; }

20 Hash TablesCS-2301, B-Term 200920 Using a Hash Table (continued) struct item *addItem(char *s, …) { struct item *np; unsigned int hv; if ((np = lookup(s)) == NULL) { np = malloc(item); /* fill in s and data */ np -> next = hashtab[hv = hash(s)]; hashtab[hv] = np; }; return np; } Inserts new item at head of the list indexed by hash value

21 Hash TablesCS-2301, B-Term 200921 Challenge What kinds of situations in your field might you need a hash table?

22 Hash TablesCS-2301, B-Term 200922 Hash Table Summary Widely used for constant time access Easy to build and maintain There is an art and science regarding the choice of hashing functions Consult textbooks, web, etc.

23 Hash TablesCS-2301, B-Term 200923 Questions?


Download ppt "Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming."

Similar presentations


Ads by Google