Data Structures and Algorithms

Slides:



Advertisements
Similar presentations
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Advertisements

Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
1 Data Structures and Algorithms The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 2 Professor.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1 Hash Tables Professor Jennifer Rexford COS 217.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Data Structures. 2 Motivating Quotation “Every program depends on algorithms and data structures, but few programs depend on the invention of brand.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Building Java Programs Generics, hashing reading: 18.1.
Searching, Maps,Tries (hashing)
Design & Analysis of Algorithm Hashing
5.13 Recursion Recursive functions Functions that call themselves
Hash table CSC317 We have elements with key and satellite data
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
Hashing CSE 2011 Winter July 2018.
School of Computer Science and Engineering
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Review Graph Directed Graph Undirected Graph Sub-Graph
Dictionaries 9/14/ :35 AM Hash Tables   4
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash functions Open addressing
Hash Tables.
Hash Functions Sections 5.1 and 5.2
Hash Tables Part II: Using Buckets
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Optimizing Malloc and Free
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Computer Science 2 Hashing
Building Java Programs
Hash Tables.
Hashing CS2110.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Algorithm Efficiency Chapter 10.
slides created by Marty Stepp and Hélène Martin
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373: Data Structures and Algorithms
Hash Tables and Associative Containers
slides adapted from Marty Stepp and Hélène Martin
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CSE 143 Lecture 25 Set ADT implementation; hashing read 11.2
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
EE 312 Software Design and Implementation I
Hash Tables Buckets/Chaining
Data Structures & Algorithms
slides created by Marty Stepp
Hashing based on slides by Marty Stepp
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
slides created by Marty Stepp and Hélène Martin
Data Structures and Algorithm Analysis Hashing
EE 312 Software Design and Implementation I
CSE 326: Data Structures Lecture #14
SPL – PS2 C++ Memory Handling.
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Data Structures and Algorithms The material for this lecture is from Princeton COS 217, which is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 2

Motivating Quotation “Every program depends on algorithms and data structures, but few programs depend on the invention of brand new ones.” -- Kernighan & Pike

Goals of this Lecture Help you learn about: Why? Shallow motivation: Common data structures and algorithms Why? Shallow motivation: Provide examples of pointer-related C code Why? Deeper motivation: Common data structures and algorithms serve as “high level building blocks” A power programmer: Rarely creates programs from scratch Often creates programs using high level building blocks

A Common Task Maintain a table of key/value pairs Examples Each key is a string; each value is an int Unknown number of key-value pairs For simplicity, allow duplicate keys (client responsibility) In Assignment #3, must check for duplicate keys! Examples (student name, grade) (“john smith”, 84), (“jane doe”, 93), (“myungbak lee”, 81) (baseball player, number) (“Ruth”, 3), (“Gehrig”, 4), (“Mantle”, 7) (variable name, value) (“maxLength”, 2000), (“i”, 7), (“j”, -10) The code in this lecture does not check for duplicate keys. In contrast, the symbol table that students create for Assignment 3 must check for duplicate keys. That difference is intentional.

Data Structures and Algorithms Linked list of key/value pairs Hash table of key/value pairs Algorithms Create: Create the data structure Add: Add a key/value pair Search: Search for a key/value pair, by key Free: Free the data structure

Data Structure #1: Linked List Data structure: Nodes; each contains key/value pair and pointer to next node Algorithms: Create: Allocate Table structure to point to first node Add: Insert new node at front of list Search: Linear search through the list Free: Free nodes while traversing; free Table structure "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Data Structure struct Node { const char *key; int value; struct Node *next; }; struct Table { struct Node *first; struct Node struct Node struct Table "Gehrig" "Ruth" 4 3 NULL

Linked List: Create (1) t struct Table *Table_create(void) { malloc(sizeof(struct Table)); t->first = NULL; return t; } struct Table *t; … t = Table_create(); t

Linked List: Create (2) t struct Table *Table_create(void) { malloc(sizeof(struct Table)); t->first = NULL; return t; } struct Table *t; … t = Table_create(); t NULL

Linked List: Add (1) These are pointers to strings that exist in void Table_add(struct Table *t,const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); /* we omit error checking here (e.g., p == NULL) */ p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); These are pointers to strings that exist in the RODATA section t "Gehrig" "Ruth" 4 3 NULL

Linked List: Add (2) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); p Note that the link list’s key simply points to an area of memory that is owned by the caller. That’s dangerous. We’ll fix that in a later slide. t "Gehrig" "Ruth" 4 3 NULL

Linked List: Add (3) p t void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); p Note that the link list’s key simply points to an area of memory that is owned by the caller. That’s dangerous. We’ll fix that in a later slide. "Mantle" t 7 "Gehrig" "Ruth" 4 3 NULL

Linked List: Add (4) p t void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); p "Mantle" t 7 "Gehrig" "Ruth" 4 3 NULL

Linked List: Add (5) p t void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); p "Mantle" t 7 "Gehrig" "Ruth" 4 3 NULL

Linked List: Search (1) t int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Search (2) p t int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); p t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Search (3) p t int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); p t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Search (4) p t int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); p t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Search (5) p t int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); p t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Search (6) p t int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); 1 p 4 t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (1) t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (2) p t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); p t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (3) p nextp t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); p nextp t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (4) p nextp t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); p nextp t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (5) p nextp t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); p nextp t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (6) p nextp t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); p nextp t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (7) p nextp t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); p nextp t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (8) p nextp t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); p nextp t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List: Free (9) p nextp t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t) struct Table *t; … Table_free(t); p nextp t "Mantle" "Gehrig" "Ruth" 7 4 3 NULL

Linked List Performance Create: fast Add: fast Search: slow Free: slow What are the asymptotic run times (big-oh notation)? Timing analysis of the given algorithms Create: O(1), fast Add: O(1), fast Search: O(n), slow Free: O(n), slow Alternative: Keep nodes in sorted order by key Add: O(n), slow; must traverse part of list to find proper spot Search: O(n), still slow; must traverse part of list Would it be better to keep the nodes sorted by key?

Data Structure #2: Hash Table Fixed-size array where each element points to a linked list Function maps each key to an array index For example, for an integer key h Hash function: i = h % ARRAYSIZE (mod function) Go to array element i, i.e., the linked list hashtab[i] Search for element, add element, remove element, etc. ARRAYSIZE-1 struct Node *array[ARRAYSIZE];

Hash Table Example Integer keys, array of size 5 with hash function “h mod 5” “1776 % 5” is 1 “1861 % 5” is 1 “1939 % 5” is 4 1776 1861 Revolution Civil 1 2 3 4 1939 WW2

How Large an Array? Large enough that average “bucket” size is 1 Short buckets mean fast search Long buckets mean slow search Small enough to be memory efficient Not an excessive number of elements Fortunately, each array element is just storing a pointer This is OK: ARRAYSIZE-1

What Kind of Hash Function? Good at distributing elements across the array Distribute results over the range 0, 1, …, ARRAYSIZE-1 Distribute results evenly to avoid very long buckets This is not so good: The worst hash function would generate the same hash code for any element. In that case the performance of the hash table would degrade to that of a linked list. What would be the worst possible hash function? ARRAYSIZE-1

Hashing String Keys to Integers Simple schemes don’t distribute the keys evenly enough Number of characters, mod ARRAYSIZE Sum the ASCII values of all characters, mod ARRAYSIZE … Here’s a reasonably good hash function Weighted sum of characters xi in the string ( aixi) mod ARRAYSIZE Best if a and ARRAYSIZE are relatively prime E.g., a = 65599, ARRAYSIZE = 1024

Implementing Hash Function Potentially expensive to compute ai for each value of i Computing ai for each value of I Instead, do (((x[0] * 65599 + x[1]) * 65599 + x[2]) * 65599 + x[3]) * … unsigned int hash(const char *x) { int i; unsigned int h = 0U; for (i=0; x[i]!='\0'; i++) h = h * 65599 + (unsigned char)x[i]; return h % 1024; } Can be more clever than this for powers of two! (Described in Appendix)

Hash Table Example Example: ARRAYSIZE = 7 Lookup (and enter, if not present) these strings: the, cat, in, the, hat Hash table initially empty. First word: the. hash(“the”) = 965156977. 965156977 % 7 = 1. Search the linked list table[1] for the string “the”; not found. 1 2 3 4 5 6

Hash Table Example (cont.) Example: ARRAYSIZE = 7 Lookup (and enter, if not present) these strings: the, cat, in, the, hat Hash table initially empty. First word: “the”. hash(“the”) = 965156977. 965156977 % 7 = 1. Search the linked list table[1] for the string “the”; not found Now: table[1] = makelink(key, value, table[1]) 1 2 3 4 5 6 the

Hash Table Example (cont.) Second word: “cat”. hash(“cat”) = 3895848756. 3895848756 % 7 = 2. Search the linked list table[2] for the string “cat”; not found Now: table[2] = makelink(key, value, table[2]) 1 2 3 4 5 6 the

Hash Table Example (cont.) Third word: “in”. hash(“in”) = 6888005. 6888005% 7 = 5. Search the linked list table[5] for the string “in”; not found Now: table[5] = makelink(key, value, table[5]) 1 2 3 4 5 6 the cat

Hash Table Example (cont.) Fourth word: “the”. hash(“the”) = 965156977. 965156977 % 7 = 1. Search the linked list table[1] for the string “the”; found it! 1 2 3 4 5 6 the cat in

Hash Table Example (cont.) Fourth word: “hat”. hash(“hat”) = 865559739. 865559739 % 7 = 2. Search the linked list table[2] for the string “hat”; not found. Now, insert “hat” into the linked list table[2]. At beginning or end? Doesn’t matter. 1 2 3 4 5 6 the cat in

Hash Table Example (cont.) Inserting at the front is easier, so add “hat” at the front 1 2 3 4 5 6 the hat cat in

Hash Table: Data Structure enum {BUCKET_COUNT = 1024}; struct Node { const char *key; int value; struct Node *next; }; struct Table { struct Node *array[BUCKET_COUNT]; struct Table struct Node struct Node NULL "Ruth" 1 NULL … 3 23 NULL … "Gehrig" 723 4 … NULL 806 NULL … 1023 NULL

Hash Table: Create t struct Table *Table_create(void) { t = (struct Table*)calloc(1, sizeof(struct Table)); return t; } struct Table *t; … t = Table_create(); t NULL 1 NULL … 1023 NULL

Hash Table: Add (1) Pretend that “Ruth” hashed to 23 and void Table_add(struct Table *t,const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); t These are pointers to strings that exist in the RODATA section NULL "Ruth" 1 NULL … 3 NULL 23 … "Gehrig" 4 723 … Pretend that “Ruth” hashed to 23 and “Gehrig” to 723 NULL 806 NULL … 1023 NULL

Hash Table: Add (2) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); t NULL "Ruth" 1 NULL … 3 NULL p 23 … "Gehrig" 4 723 … NULL 806 NULL … 1023 NULL

Hash Table: Add (3) Pretend that “Mantle” hashed to 806, and so void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); t Pretend that “Mantle” hashed to 806, and so h = 806 NULL "Ruth" 1 NULL … 3 NULL p 23 … "Gehrig" 4 723 … NULL 806 NULL "Mantle" … 7 1023 NULL

Hash Table: Add (4) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); t h = 806 NULL "Ruth" 1 NULL … 3 NULL p 23 … "Gehrig" 4 723 … NULL 806 NULL "Mantle" … 7 1023 NULL NULL

Hash Table: Add (5) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); t h = 806 NULL "Ruth" 1 NULL … 3 NULL p 23 … "Gehrig" 4 723 … NULL 806 "Mantle" … 7 1023 NULL NULL

Hash Table: Search (1) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); t NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Search (2) Pretend that “Gehrig” hashed to 723, and so int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); t Pretend that “Gehrig” hashed to 723, and so h = 723 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Search (3) t p h = 723 int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); t NULL p h = 723 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Search (4) t p h = 723 int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); t NULL p h = 723 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Search (5) t p h = 723 int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); t 1 NULL p h = 723 4 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (1) t void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (2) t b = 0 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t b = 0 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (3) t p b = 0 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t p b = 0 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (4) t p b = 1, …, 23 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t p b = 1, …, 23 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (5) t p b = 23 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t p b = 23 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (6) t p nextp b = 23 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t p nextp b = 23 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (7) t p nextp b = 23 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t p nextp b = 23 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (8) t b = 24, …, 723 b = 724, …, 806 b = 807, …, 1024 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t b = 24, …, 723 b = 724, …, 806 b = 807, …, 1024 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table: Free (9) t b = 1024 void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); struct Table *t; … Table_free(t); t b = 1024 NULL 1 NULL … "Ruth" 3 23 … NULL "Gehrig" 723 4 … … NULL "Mantle" 806 … 7 1023 NULL NULL

Hash Table Performance Create: fast Add: fast Search: fast Free: slow What are the asymptotic run times (big-oh notation)? Create: O(1), fast Add: O(1), fast Search: O(1), fast Free: O(n), slow Search is fast if and only if the average bucket size is small Is hash table search always fast?

Key Ownership Note: Table_add() functions contain this code: Caller passes key, which is a pointer to memory where a string resides Table_add() function stores within the table the address where the string resides void Table_add(struct Table *t, const char *key, int value) { … struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; }

Key Ownership (cont.) Problem: Consider this calling code: struct Table t; char k[100] = "Ruth"; … Table_add(t, k, 3); strcpy(k, "Gehrig"); Via Table_add(), table contains memory address k Client changes string at memory address k Thus client changes key within table What happens if the client searches t for “Ruth”? What happens if the client searches t for “Gehrig”? Trouble in hash table Existing node’s key has been changed from “Ruth” to “Gehrig” Hash table has been corrupted!!! Search for “Ruth” will fail; key “Ruth” no longer exists Search for “Gehrig” also (probably) will fail; “Gehrig” probably is in the wrong bucket! Could be trouble in other data structures too

Key Ownership (cont.) Solution: Table_add() saves copy of given key If client changes string at memory address k, data structure is not affected Then the data structure “owns” the copy, that is: The data structure is responsible for freeing the memory in which the copy resides The Table_free() function must free the copy void Table_add(struct Table *t, const char *key, int value) { … struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = (const char*)malloc(strlen(key) + 1); strcpy(p->key, key); } Why add 1? Must add 1 to allow room for the ‘\0’ terminator.

Summary Common data structures and associated algorithms Linked list fast insert, slow search Hash table Fast insert, (potentially) fast search Invaluable for storing key/value pairs Very common Related issues Hashing algorithms Memory ownership

Appendix “Stupid programmer tricks” related to hash tables…

Revisiting Hash Functions Potentially expensive to compute “mod c” Involves division by c and keeping the remainder Easier when c is a power of 2 (e.g., 16 = 24) An alternative (by example) 53 = 32 + 16 + 4 + 1 53 % 16 is 5, the last four bits of the number Would like an easy way to isolate the last four bits… 32 16 8 4 2 1 1 1 1 1 32 16 8 4 2 1 1 1

Recall: Bitwise Operators in C Bitwise AND (&) Mod on the cheap! E.g., h = 53 & 15; Bitwise OR (|) | 1 & 1 One’s complement (~) Turns 0 to 1, and 1 to 0 E.g., set last three bits to 0 x = x & ~7; 53 1 1 1 1 & 15 1 1 1 1 5 1 1

A Faster Hash Function Previous version Faster unsigned int hash(const char *x) { int i; unsigned int h = 0U; for (i=0; x[i]!='\0'; i++) h = h * 65599 + (unsigned char)x[i]; return h % 1024; } Previous version unsigned int hash(const char *x) { int i; unsigned int h = 0U; for (i=0; x[i]!='\0'; i++) h = h * 65599 + (unsigned char)x[i]; return h & 1023; } Faster The expression h & 1024 sets all bits but one to 0. So only two buckets of the hash table will contain bindings: bucket 0 and bucket 1024. That assumes that bucket 1024 exists! What happens if you mistakenly write “h & 1024”?

Speeding Up Key Comparisons For any non-trivial value comparison function Trick: store full hash result in structure int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); /* No % in hash function */ for (p = t->array[h%1024]; p != NULL; p = p->next) if ((p->hash == h) && strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; Why is this so much faster?