1 Data Structures and Algorithms The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 2 Professor.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hashing as a Dictionary Implementation
CHAPTER 7 HASHING What is hashing for? For searching But we already have binary search in O( ln n ) time after sorting. And we already have algorithms.
1 Optimizing Malloc and Free Professor Jennifer Rexford COS 217 Reading: Section 8.7 in K&R book
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing CS 3358 Data Structures.
1 Generics Professor Jennifer Rexford
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Chapter 5: Hashing Hash Tables
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1 Hash Tables Professor Jennifer Rexford COS 217.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Dictionaries and Hash Tables. Dictionary A dictionary, in computer science, implies a container that stores key-element pairs called items, and allows.
1 Data Structures Searching Techniques Namiq Sultan.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
LECTURE 34: MAPS & HASH CSC 212 – Data Structures.
Storage and Retrieval Structures by Ron Peterson.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Data Structures. 2 Motivating Quotation “Every program depends on algorithms and data structures, but few programs depend on the invention of brand.
Hashing as a Dictionary Implementation Chapter 19.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
ENEE150 – 0102 ANDREW GOFFIN Project 4 & Function Pointers.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 the hash table. hash table A hash table consists of two major components …
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Building Java Programs Generics, hashing reading: 18.1.
Data Structures and Algorithms
Efficiency add remove find unsorted array O(1) O(n) sorted array
[Chapter 4; Chapter 6, pp ] CSC 143 Linked Lists [Chapter 4; Chapter 6, pp ]
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CS202 - Fundamental Structures of Computer Science II
slides created by Marty Stepp
Lecture-Hashing.
Presentation transcript:

1 Data Structures and Algorithms The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 2 Professor Jennifer Rexford

2 Motivating Quotation “Every program depends on algorithms and data structures, but few programs depend on the invention of brand new ones.” -- Kernighan & Pike Corollary: work smarter, not harder

3 Goals of this Lecture Help you learn (or refresh your memory) about: Commonly used data structures and algorithms Shallow motivation Provide examples of typical pointer-related C code Deeper motivation Common data structures and algorithms serve as “high level building blocks” A power programmer: Rarely creates large programs from scratch Creates large programs using high level building blocks whenever possible

4 A Common Task Maintain a table of key/value pairs Each key is a string; each value is an int Unknown number of key-value pairs For simplicity, allow duplicate keys (client responsibility) In Assignment #3, must check for duplicate keys! Examples (student name, grade) (“john smith”, 84), (“jane doe”, 93), (“bill clinton”, 81) (baseball player, number) (“Ruth”, 3), (“Gehrig”, 4), (“Mantle”, 7) (variable name, value) (“maxLength”, 2000), (“i”, 7), (“j”, -10)

5 Data Structures and Algorithms Data structures: two ways to store the data Linked list of key/value pairs Hash table of key/value pairs Expanding array of key/value pairs (see Appendix) Algorithms: various ways to manipulate the data Create: Create the data structure Add: Add a key/value pair Search: Search for a key/value pair, by key Free: Free the data structure

6 Data Structure #1: Linked List Data structure: Nodes; each node contains a key/value pair and a pointer to the next node Algorithms: Create: Allocate “dummy” node to point to first real node Add: Create a new node, and insert at front of list Search: Linear search through the list Free: Free nodes while traversing; free dummy node 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle"

7 Linked List: Data Structure struct Node { const char *key; int value; struct Node *next; }; struct Table { struct Node *first; };

8 Linked List: Create (1) STACKHEAP t struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->first = NULL; return t; } struct Table *t; … t = Table_create(); …

9 Linked List: Create (2) STACKHEAP t t struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->first = NULL; return t; } struct Table *t; … t = Table_create(); …

10 Linked List: Create (3) STACKHEAP t t NULL struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->first = NULL; return t; } struct Table *t; … t = Table_create(); …

11 Linked List: Create (4) struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->first = NULL; return t; } struct Table *t; … t = Table_create(); … STACKHEAP t NULL

12 Linked List: Add (1) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL These are pointers to strings that exist in the RODATA section

13 Linked List: Add (2) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL t "Mantle" key 7 value This is a pointer to a string that exists in the RODATA section

14 Linked List: Add (3) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL t "Mantle" key 7 value 7 "Mantle" p

15 Linked List: Add (4) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL t "Mantle" key 7 value 7 "Mantle" p

16 Linked List: Add (5) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL t "Mantle" key 7 value 7 "Mantle" p

17 Linked List: Add (6) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; p->value = value; p->next = t->first; t->first = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle"

18 Linked List: Search (1) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" value found

19 Linked List: Search (2) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" value found t "Gehrig" key value

20 Linked List: Search (3) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" value found t "Gehrig" key value p

21 Linked List: Search (4) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" 4 value found t "Gehrig" key value p

22 Linked List: Search (5) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; for (p = t->first; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" 4 value 1 found

23 Linked List: Free (1) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle"

24 Linked List: Free (2) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" t

25 Linked List: Free (3) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" t p nextp

26 Linked List: Free (4) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL 7 "Mantle" t p nextp

27 Linked List: Free (5) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL t p nextp

28 Linked List: Free (6) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t 4 "Gehrig" 3 "Ruth" NULL t p nextp

29 Linked List: Free (7) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t t NULL p nextp

30 Linked List: Free (8) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t t NULL p nextp

31 Linked List: Free (9) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; for (p = t->first; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); /* Free the dummy node */ } STACKHEAP t

32 Linked List Performance Timing analysis of the given algorithms Create: O(1), fast Add:O(1), fast Search:O(n), slow Free:O(n), slow Alternative: Keep nodes in sorted order by key Create: O(1), fast Add:O(n), slow; must traverse part of list to find proper spot Search:O(n), still slow; must traverse part of list Free:O(n), slow

33 Data Structure #2: Hash Table Fixed-size array where each element points to a linked list Function maps each key to an array index For example, for an integer key h Hash function: i = h % ARRAYSIZE (mod function) Go to array element i, i.e., the linked list hashtab[i] Search for element, add element, remove element, etc. 0 ARRAYSIZE-1 struct Node *array[ARRAYSIZE];

34 Hash Table Example Integer keys, array of size 5 with hash function “h mod 5” “1776 % 5” is 1 “1861 % 5” is 1 “1939 % 5” is Revolution 1861 Civil 1939 WW

35 How Large an Array? Large enough that average “bucket” size is 1 Short buckets mean fast search Long buckets mean slow search Small enough to be memory efficient Not an excessive number of elements Fortunately, each array element is just storing a pointer This is OK: 0 ARRAYSIZE-1

36 What Kind of Hash Function? Good at distributing elements across the array Distribute results over the range 0, 1, …, ARRAYSIZE-1 Distribute results evenly to avoid very long buckets This is not so good: 0 ARRAYSIZE-1

37 Hashing String Keys to Integers Simple schemes don’t distribute the keys evenly enough Number of characters, mod ARRAYSIZE Sum the ASCII values of all characters, mod ARRAYSIZE … Here’s a reasonably good hash function Weighted sum of characters x i in the string (  a i x i ) mod ARRAYSIZE Best if a and ARRAYSIZE are relatively prime E.g., a = 65599, ARRAYSIZE = 1024

38 Implementing Hash Function Potentially expensive to compute a i for each value of i Computing a i for each value of I Instead, do (((x[0] * x[1]) * x[2]) * x[3]) * … unsigned int hash(const char *x) { int i; unsigned int h = 0U; for (i=0; x[i]!='\0'; i++) h = h * (unsigned char)x[i]; return h % 1024; } Can be more clever than this for powers of two! (Described in Appendix)

39 Hash Table Example Example: ARRAYSIZE = 7 Lookup (and enter, if not present) these strings: the, cat, in, the, hat Hash table initially empty. First word: the. hash(“the”) = % 7 = 1. Search the linked list table[1] for the string “the”; not found

40 Hash Table Example (cont.) Example: ARRAYSIZE = 7 Lookup (and enter, if not present) these strings: the, cat, in, the, hat Hash table initially empty. First word: “the”. hash(“the”) = % 7 = 1. Search the linked list table[1] for the string “the”; not found Now: table[1] = makelink(key, value, table[1]) the

41 Hash Table Example (cont.) Second word: “cat”. hash(“cat”) = % 7 = 2. Search the linked list table[2] for the string “cat”; not found Now: table[2] = makelink(key, value, table[2]) the

42 Hash Table Example (cont.) Third word: “in”. hash(“in”) = % 7 = 5. Search the linked list table[5] for the string “in”; not found Now: table[5] = makelink(key, value, table[5]) the cat

43 Hash Table Example (cont.) Fourth word: “the”. hash(“the”) = % 7 = 1. Search the linked list table[1] for the string “the”; found it! the cat in

44 Hash Table Example (cont.) Fourth word: “hat”. hash(“hat”) = % 7 = 2. Search the linked list table[2] for the string “hat”; not found. Now, insert “hat” into the linked list table[2]. At beginning or end? Doesn’t matter the cat in

45 Hash Table Example (cont.) Inserting at the front is easier, so add “hat” at the front the hat in cat

46 Hash Table: Data Structure enum {BUCKET_COUNT = 1024}; struct Node { const char *key; int value; struct Node *next; }; struct Table { struct Node *array[BUCKET_COUNT]; };

47 Hash Table: Create (1) struct Table *Table_create(void) { struct Table *t; t = (struct Table*)calloc(1, sizeof(struct Table)); return t; } struct Table *t; … t = Table_create(); … STACKHEAP t

48 Hash Table: Create (2) struct Table *Table_create(void) { struct Table *t; t = (struct Table*)calloc(1, sizeof(struct Table)); return t; } struct Table *t; … t = Table_create(); … STACKHEAP t t

49 Hash Table: Create (3) struct Table *Table_create(void) { struct Table *t; t = (struct Table*)calloc(1, sizeof(struct Table)); return t; } struct Table *t; … t = Table_create(); … STACKHEAP t NULL t …

50 Hash Table: Create (4) struct Table *Table_create(void) { struct Table *t; t = (struct Table*)calloc(1, sizeof(struct Table)); return t; } struct Table *t; … t = Table_create(); … STACKHEAP t NULL …

51 Hash Table: Add (1) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t NULL 4 "Gehrig" NULL 3 "Ruth" NULL … … … NULL 1023 … These are pointers to strings that exist in the RODATA section

52 … "Mantle" 7 Hash Table: Add (2) void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL t key value NULL … … NULL 1023 … This is a pointer to a string that exists in the RODATA section

53 … void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } "Mantle" 7 Hash Table: Add (3) STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" t key value p 806 h … NULL … NULL 1023 … struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … …

54 … void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … "Mantle" 7 Hash Table: Add (4) STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL t key value p 806 h … NULL … NULL 1023 … …

55 … void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … "Mantle" 7 Hash Table: Add (5) STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL t key value p 806 h … NULL … NULL 1023 … …

56 … void Table_add(struct Table *t, const char *key, int value) { struct Node *p = (struct Node*)malloc(sizeof(struct Node)); int h = hash(key); p->key = key; p->value = value; p->next = t->array[h]; t->array[h] = p; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … Hash Table: Add (6) STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … …

57 Hash Table: Search (1) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … value found

58 Hash Table: Search (2) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … value found t "Gehrig" key value

59 Hash Table: Search (3) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … value found t "Gehrig" key value p 723 h

60 Hash Table: Search (4) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … value found t "Gehrig" key value p 723 h

61 4 Hash Table: Search (5) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … value found t "Gehrig" key value p 723 h

Hash Table: Search (6) int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); for (p = t->array[h]; p != NULL; p = p->next) if (strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … value found

63 Hash Table: Free (1) void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); } struct Table *t; … Table_free(t); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … …

64 Hash Table: Free (2) void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); } struct Table *t; … Table_free(t); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … t

65 Hash Table: Free (3) void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); } struct Table *t; … Table_free(t); … … STACKHEAP t 4 "Gehrig" NULL 3 "Ruth" NULL 7 "Mantle" NULL … … NULL 1023 … … t p nextp b

Hash Table: Free (4) void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); } struct Table *t; … Table_free(t); … … STACKHEAP t … NULL … NULL 1023 … … t p nextp b

Hash Table: Free (5) void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); } struct Table *t; … Table_free(t); … STACKHEAP t t p nextp b

68 Hash Table: Free (6) void Table_free(struct Table *t) { struct Node *p; struct Node *nextp; int b; for (b = 0; b < BUCKET_COUNT; b++) for (p = t->array[b]; p != NULL; p = nextp) { nextp = p->next; free(p); } free(t); } struct Table *t; … Table_free(t); … STACKHEAP t

69 Hash Table Performance Create: O(1), fast Add:O(1), fast Search:O(1), fast – if and only if bucket sizes are small Free:O(n), slow

70 Key Ownership Note: Table_add() functions contain this code: Caller passes key, which is a pointer to memory where a string resides Table_add() function simply stores within the table the address where the string resides void Table_add(struct Table *t, const char *key, int value) { … struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = key; … }

71 Key Ownership (cont.) Problem: Consider this calling code: Trouble in hash table Existing node’s key has been changed from “Ruth” to “Gehrig” Existing node now is in wrong bucket!!! Hash table has been corrupted!!! Could be trouble in other data structures too struct Table t; char k[100] = "Ruth"; … Table_add(t, k, 3); strcpy(k, "Gehrig"); … Via Table_add(), table contains memory address k Client changes string at memory address k Thus client changes key within table

72 Key Ownership (cont.) Solution: Table_add() saves copy of given key If client changes string at memory address k, data structure is not affected Then the data structure “owns” the copy, that is: The data structure is responsible for freeing the memory in which the copy resides The Table_free() function must free the copy void Table_add(struct Table *t, const char *key, int value) { … struct Node *p = (struct Node*)malloc(sizeof(struct Node)); p->key = (const char*)malloc(strlen(key) + 1); strcpy(p->key, key); … } Allow room for ‘\0’

73 Summary Common data structures and associated algorithms Linked list Unsorted => fast insert, slow search Sorted => slow insert, slow search Hash table Fast insert, fast search – iff hash function works well Invaluable for storing key/value pairs Very common Related issues Hashing algorithms Memory ownership Two appendices Appendix #1: tricks for faster hash tables Appendix #2: example of a third data structure

74 Appendix 1 “Stupid programmer tricks” related to hash tables…

75 Revisiting Hash Functions Potentially expensive to compute “mod c” Involves division by c and keeping the remainder Easier when c is a power of 2 (e.g., 16 = 2 4 ) An alternative (by example) 53 = % 16 is 5, the last four bits of the number Would like an easy way to isolate the last four bits…

76 Recall: Bitwise Operators in C Bitwise AND (&) Mod on the cheap! E.g., h = 53 & 15; Bitwise OR (|) & | & One’s complement (~) Turns 0 to 1, and 1 to 0 E.g., set last three bits to 0 x = x & ~7;

77 A Faster Hash Function Beware: Don’t write “h & 1024” unsigned int hash(const char *x) { int i; unsigned int h = 0U; for (i=0; x[i]!='\0'; i++) h = h * (unsigned char)x[i]; return h % 1024; } unsigned int hash(const char *x) { int i; unsigned int h = 0U; for (i=0; x[i]!='\0'; i++) h = h * (unsigned char)x[i]; return h & 1023; } Faster Previous version

78 Speeding Up Key Comparisons Speeding up key comparisons For any non-trivial value comparison function Trick: store full hash result in structure int Table_search(struct Table *t, const char *key, int *value) { struct Node *p; int h = hash(key); /* No % in hash function */ for (p = t->array[h%1024]; p != NULL; p = p->next) if ((p->hash == h) && strcmp(p->key, key) == 0) { *value = p->value; return 1; } return 0; }

79 Appendix 2: Another Data Structure Expanding array…

80 Expanding Array The general idea… Data structure: An array that expands as necessary Create algorithm: Allocate an array of key/value pairs; initially the array has few elements Add algorithm: If out of room, double the size of the array; copy the given key/value pair into the first unused element Note: For efficiency, expand the array geometrically instead of linearly Search algorithm: Simple linear search Free algorithm: Free the array

81 Expanding Array: Data Structure enum {INITIAL_SIZE = 2}; enum {GROWTH_FACTOR = 2}; struct Pair { const char *key; int value; }; struct Table { int pairCount; /* Number of pairs in table */ int arraySize; /* Physical size of array */ struct Pair *array; /* Address of array */ };

82 Expanding Array: Create (1) struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->pairCount = 0; t->arraySize = INITIAL_SIZE; t->array = (struct Pair*) calloc(INITIAL_SIZE, sizeof(struct Pair)); return t; } { struct Table *t; … t = Table_create(); … } STACKHEAP t

83 Expanding Array: Create (2) struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->pairCount = 0; t->arraySize = INITIAL_SIZE; t->array = (struct Pair*) calloc(INITIAL_SIZE, sizeof(struct Pair)); return t; } { struct Table *t; … t = Table_create(); … } STACKHEAP t t

84 Expanding Array: Create (3) struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->pairCount = 0; t->arraySize = INITIAL_SIZE; t->array = (struct Pair*) calloc(INITIAL_SIZE, sizeof(struct Pair)); return t; } { struct Table *t; … t = Table_create(); … } STACKHEAP t t 2 0

85 Expanding Array: Create (4) struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->pairCount = 0; t->arraySize = INITIAL_SIZE; t->array = (struct Pair*) calloc(INITIAL_SIZE, sizeof(struct Pair)); return t; } { struct Table *t; … t = Table_create(); … } STACKHEAP t t 2 0

86 Expanding Array: Create (5) struct Table *Table_create(void) { struct Table *t; t = (struct Table*) malloc(sizeof(struct Table)); t->pairCount = 0; t->arraySize = INITIAL_SIZE; t->array = (struct Pair*) calloc(INITIAL_SIZE, sizeof(struct Pair)); return t; } { struct Table *t; … t = Table_create(); … } STACKHEAP t 2 0

87 Expanding Array: Add (1) void Table_add(struct Table *t, const char *key, int value) { /* Expand if necessary. */ if (t->pairCount == t->arraySize) { t->arraySize *= GROWTH_FACTOR; t->array = (struct Pair*)realloc(t->array, t->arraySize * sizeof(struct Pair)); } t->array[t->pairCount].key = key; t->array[t->pairCount].value = value; t->pairCount++; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … t "Ruth" "Gehrig" 4 STACKHEAP These are pointers to strings that exist in the RODATA section

88 t Expanding Array: Add (2) void Table_add(struct Table *t, const char *key, int value) { /* Expand if necessary. */ if (t->pairCount == t->arraySize) { t->arraySize *= GROWTH_FACTOR; t->array = (struct Pair*)realloc(t->array, t->arraySize * sizeof(struct Pair)); } t->array[t->pairCount].key = key; t->array[t->pairCount].value = value; t->pairCount++; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … "Ruth" "Gehrig" 4 STACKHEAP t "Mantle" 7 key value This is a pointer to a string that exists in the RODATA section

89 t Expanding Array: Add (3) void Table_add(struct Table *t, const char *key, int value) { /* Expand if necessary. */ if (t->pairCount == t->arraySize) { t->arraySize *= GROWTH_FACTOR; t->array = (struct Pair*)realloc(t->array, t->arraySize * sizeof(struct Pair)); } t->array[t->pairCount].key = key; t->array[t->pairCount].value = value; t->pairCount++; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … "Ruth" "Gehrig" 4 STACKHEAP t "Mantle" 7 key value "Ruth" 3 "Gehrig" 4

90 t Expanding Array: Add (4) void Table_add(struct Table *t, const char *key, int value) { /* Expand if necessary. */ if (t->pairCount == t->arraySize) { t->arraySize *= GROWTH_FACTOR; t->array = (struct Pair*)realloc(t->array, t->arraySize * sizeof(struct Pair)); } t->array[t->pairCount].key = key; t->array[t->pairCount].value = value; t->pairCount++; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t "Mantle" 7 key value

91 Expanding Array: Add (5) void Table_add(struct Table *t, const char *key, int value) { /* Expand if necessary. */ if (t->pairCount == t->arraySize) { t->arraySize *= GROWTH_FACTOR; t->array = (struct Pair*)realloc(t->array, t->arraySize * sizeof(struct Pair)); } t->array[t->pairCount].key = key; t->array[t->pairCount].value = value; t->pairCount++; } struct Table *t; … Table_add(t, "Ruth", 3); Table_add(t, "Gehrig", 4); Table_add(t, "Mantle", 7); … "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t

92 Expanding Array: Search (1) int Table_search(struct Table *t, const char *key, int *value) { int i; for (i = 0; i pairCount; i++) { struct Pair p = t->array[i]; if (strcmp(p.key, key) == 0) { *value = p.value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … STACKHEAP 4 3 "Gehrig" "Mantle" 4 7 t value found "Ruth" 3

93 Expanding Array: Search (2) int Table_search(struct Table *t, const char *key, int *value) { int i; for (i = 0; i pairCount; i++) { struct Pair p = t->array[i]; if (strcmp(p.key, key) == 0) { *value = p.value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t value found "Gehrig" t key value

94 Expanding Array: Search (3) int Table_search(struct Table *t, const char *key, int *value) { int i; for (i = 0; i pairCount; i++) { struct Pair p = t->array[i]; if (strcmp(p.key, key) == 0) { *value = p.value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t value found "Gehrig" t key value 1 i p

95 4 Expanding Array: Search (4) int Table_search(struct Table *t, const char *key, int *value) { int i; for (i = 0; i pairCount; i++) { struct Pair p = t->array[i]; if (strcmp(p.key, key) == 0) { *value = p.value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t value found "Gehrig" t key value 1 i p

Expanding Array: Search (5) int Table_search(struct Table *t, const char *key, int *value) { int i; for (i = 0; i pairCount; i++) { struct Pair p = t->array[i]; if (strcmp(p.key, key) == 0) { *value = p.value; return 1; } return 0; } struct Table *t; int value; int found; … found = Table_search(t, "Gehrig", &value); … "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t value found

97 Expanding Array: Free (1) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { free(t->array); free(t); } "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t

98 Expanding Array: Free (2) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { free(t->array); free(t); } "Ruth" "Gehrig" "Mantle" 4 7 STACKHEAP t t

99 Expanding Array: Free (3) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { free(t->array); free(t); } 4 3 STACKHEAP t t

100 Expanding Array: Free (4) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { free(t->array); free(t); } STACKHEAP t t

101 Expanding Array: Free (5) struct Table *t; … Table_free(t); … void Table_free(struct Table *t) { free(t->array); free(t); } STACKHEAP t

102 Expanding Array Performance Timing analysis of given algorithms Create: O(1), fast Add:O(1), fast Search:O(n), slow Free:O(1), fast Alternative: Keep the array sorted by key Create: O(1), fast Add:O(n), slow; must move pairs to make room for new one Search:O(log n), moderate; can use binary search Free:O(1), fast