CS 144 Advanced C++ Programming April 23 Class Meeting

Slides:



Advertisements
Similar presentations
Skip List & Hashing CSE, POSTECH.
Advertisements

CS202 - Fundamental Structures of Computer Science II
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing CS 3358 Data Structures.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
CS 146: Data Structures and Algorithms July 23 Class Meeting
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Fundamental Structures of Computer Science II
Sets and Maps Chapter 9.
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing.
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
School of Computer Science and Engineering
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hash functions Open addressing
CMPE 180A Data Structures and Algorithms in C++ April 12 Class Meeting
Advanced Associative Structures
Hash Table.
Dictionaries and Their Implementations
Collision Resolution Neil Tang 02/18/2010
CMSC 341 Hashing 12/2/2018.
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
CMSC 341 Hashing 2/18/2019.
Advanced Implementation of Tables
Advanced Implementation of Tables
Space-for-time tradeoffs
Sets and Maps Chapter 9.
EE 312 Software Design and Implementation I
Collision Resolution Neil Tang 02/21/2008
CMSC 341 Hashing 4/27/2019.
Ch Hash Tables Array or linked list Binary search trees
Ch. 13 Hash Tables  .
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Hashing.
Data Structures and Algorithm Analysis Hashing
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

CS 144 Advanced C++ Programming April 23 Class Meeting Department of Computer Engineering San Jose State University Spring 2019 Instructor: Ron Mak www.cs.sjsu.edu/~mak

Assignment #10. Recursion Recursive destructor. Each person object recursively deletes its children. Recursive print First print the person’s name and spouse. Then recursively print the person’s children. Person::~Person() {     // Recursively delete the children.     for (Person *child : children) delete child;     cout << "deleted " << name << endl; }

void Person::print() const {     if (parent != nullptr) cout << "+---";     // Print the name and the spouse's name, if any.     cout << name;     if (spouse_name.length() > 0) cout << " (" << spouse_name << ")";     cout << endl;     // Print the children, if any.     for (Person *child : children)     {         // Print any ancestors' bars.         print_bar(); cout << "|" << endl;         print_bar();         // Recursively print a child.         child->print();     } }

Assignment #10, cont’d Charles (Mary) | +---Susan (Bob) |   | |   +---Dick (Cindy) |   |   | |   |   +---Ron |   +---Harry +---George +---Tom (Alice)     |     +---Eliza (Bud)     +---Emily (Carl)     |   |     |   +---Tim     +---Charlotte (Frank)         |         +---Carol         +---Sara Recursively print the vertical bar emanating from a parent. But don’t print the bar for a parent’s last child.

Assignment #10, cont’d void Person::print_bar() const {     if (parent != nullptr)     {         // Recursively print my parent's bar.         parent->print_bar();         // Am I my parent's last child?         if (this == parent->children.back()) cout << "    ";  // yes         else                                 cout << "|   ";  // no     } }

Review: Templates A template enables the C++ compiler to generate different versions of some code, each version of the code for a different type. function templates class templates The C++ compiler does not generate code from a template for a particular type unless the program uses the template with that type.

Review: The Standard Template Library (STL) The Standard Template Library (STL) is a collection of function and class templates for various data structures, including: vector stack queue list (doubly-linked list) map (hash table) set Example: vector<int> v;

Review: Iterators Iterators provide a uniform way to successively access values in a data structure. Go through the values of a data structure one after another and perform some operation on each value. Iterators spare you from having to know how a data structure is implemented. Iterators are part of the STL. An iterator is similar to a pointer.

Containers STL container classes are data structures that hold data. Examples: lists, stacks, queues, vectors Each container class has its own iterator. However, all the iterators have the same operators and the member functions begin and end have the same meanings. Sequential containers arrange their values such that there is a first value, a next value, etc. until the last value.

The list Template Class The STL list is a doubly linked list. Each element has two pointers. One pointer points forward to the next element (as in a singly linked list). One pointer points back to the previous element. You can traverse the list from either direction. Therefore, there are two pointers to maintain when inserting or deleting an element. The STL does this for you.

List and Vector STL list functions are similar to STL vector functions. To append to a list: lst.push_back(value) Forward and reverse iterators. Insert an element before position specified by an iterator: lst.insert(pos, value) Remove the element at a position specified by an iterator : lst.erase(pos)

Additional List Functions Remove all the elements with a given value: lst.remove(value) Merge two sorted lists: lst1.merge(lst2) List lst1 is the merged list. List lst2 is empty after the merge. Make the list elements unique: lst.unique() Demo

Hash Tables Consider an array or a vector. To access a value, you use an integer index. The array maps the index to a data value stored in the array. The mapping function is very efficient. As long as the index value is within range, there is a strict one-to-one correspondence between an index value and a stored data value. We can consider the index value to be the key to the corresponding data value.

Hash Tables, cont’d A hash table also stores data values. Use a key to obtain the corresponding data value. The key does not have to be an integer value. For example, the key could be a string. There might not be a one-to-one correspondence between keys and data values. The mapping function might not be trivial.

Hash Tables, cont’d We can implement a hash table as an array of cells. Refer to its size as TableSize. If the hash table’s mapping function maps a key value into an integer value in the range 0 to TableSize – 1, then we can use this integer value as the index into the underlying array.

Hash Tables, cont’d Suppose we’re storing employee data records into a hash table. We use an employee’s name as the key.

Hash Tables, cont’d Suppose that the name john hashes (maps) to 3 phil hashes to 4 dave hashes to 6 mary hashes to 7 This is an ideal situation because each employee record ended up in a different table cell. Data Structures and Algorithms in Java, 3rd ed. by Mark Allen Weiss Pearson Education, Inc., 2012 ISBN 0-13-257627-9

Hash Function We need an ideal hash function to map each data record into a distinct table cell. It can be very difficult to find such a hash function.

A Simple Hash Function Suppose our keys are words and the table size is 10,007 (a prime number). Problem: This hash function does not distribute the keys well if the table is large. The maximum ASCII character value is 127. If a typical word is 8 characters long, the hash function generally has values from 0 to 1,016. int hash(const string& word, int table_size) { int hashVal = 0; for (char ch : word) hashVal += ch; return hashVal%table_size; }

Another Simple Hash Function int hash(const string& word, int table_size) { return (key[0] + 27*key[1] + 729*key[2])%table_size; } We use only the first three letters of each word. 27 letters in the alphabet + space 729 = 272 Good distribution into a table of 10,007 if the first three letters are random. But the English language is not random and many words will start with the same three letters.

A Better Hash Function int hash(const string& word, int table_size) { unsigned int hashVal = 0; for (char ch : word) hashVal = 37*hashVal + ch; return hashVal%table_size; } Calculates a polynomial function by nested multiplication (Horner’s rule). Easy and fast to calculate. Distributes the keys well into a large table.

Collisions The more data we put into a hash table, the more collisions occur. A collision is when two or more data records are mapped to the same table cell. How can a hash table handle collisions?

Keys for Successful Hashing Good hash function Good collision resolution Size of the underlying array a prime number

Collision Resolution Separate chaining Open addressing Linear probing Quadratic probing

Collision Resolution: Separate Chaining Each cell in a hash table is a pointer to a linked list of all the data records that hash to that entry. To retrieve a data record, we first hash to the cell. Data Structures and Algorithms in Java, 3rd ed. by Mark Allen Weiss Pearson Education, Inc., 2012 ISBN 0-13-257627-9

Collision Resolution: Separate Chaining, cont’d Then we search the associated linked list for the data record. We can sort the linked lists to improve search performance. Data Structures and Algorithms in Java, 3rd ed. by Mark Allen Weiss Pearson Education, Inc., 2012 ISBN 0-13-257627-9

Collision Resolution: Open Addressing Does not use linked lists. All the data resides in the table. When a collision occurs, try a different table cell. We will consider two types of open addressing: linear probing quadratic probing

Collision Resolution: Linear Probing Try in succession h0(x), h1(x), h2(x), … hi(x) = (hash(x) + f(i)) mod TableSize, with f(0) = 0 hash(x) produces the home cell. Function f is the collision resolution strategy. With linear probing, f is a linear function of i, typically, f(i) = i

Collision Resolution: Linear Probing, cont’d Insertion If a cell is filled, look for the next empty cell. Search Start searching at the home cell, keep looking at the next cell until you find the matching key is found. If you encounter an empty cell, there is no key match. Deletion Empty cells will prematurely terminate a search. Leave deleted items in the hash table but mark them as deleted.

Collision Resolution: Linear Probing, cont’d Suppose TableSize is 10, the keys are integer values, and the hash function is the key value modulo 10. We want to insert keys 89, 18, 49, 58, and 69. Data Structures and Algorithms in Java, 3rd ed. by Mark Allen Weiss Pearson Education, Inc., 2012 ISBN 0-13-257627-9

Collision Resolution: Quadratic Probing Linear probing causes primary clustering. Try quadratic probing instead: f(i) = i2. 49 collides with 89: the next empty cell is 1 away. 58 collides with 18: the next cell is filled. Try 22 = 4 cells away from the home cell. Same for 69. Data Structures and Algorithms in Java, 3rd ed. by Mark Allen Weiss Pearson Education, Inc., 2012 ISBN 0-13-257627-9

Collision Resolution: Quadratic Probing, cont’d Try quadratic probing instead: f(i) = i2. i2 is easy to compute, for i = 0, 1, 2, ... 1 = 12 1 + 3 = 4 = 22 1 + 3 + 5 = 9 = 32 1 + 3 + 5 + 7 = 16 = 42 …

Load Factor The load factor λ of a hash table is the ratio of the number of elements in the table to the table size. λ is much more important than table size. For probing collision resolution strategies, it is important to keep λ under 0.5. Don’t let the table become more than half full. If quadratic probing is used and the table size is a prime number, then a new element can always be inserted if the table is at most half full.

Assignment #11. STL Vector, List, and Map Time and compare the performance of the following operations on an STL vector STL list, and an STL map: inserting elements searching for elements accessing the i th element deleting elements Use std::chrono::steady_clock to calculate elapsed time.

Assignment #11, cont’d Read a file containing the text of the U.S. Constitution and its amendments. http://www.cs.sjsu.edu/~mak/CS144/assignments/11/USConstitution.txt Build concordance tables implemented as a vector, a linked list, and a map. Each table contains in alphabetical order all the unique words of the constitution. Associated with each word is the number of times that word appears. To make string comparisons case insensitive, store each word in all lower case.

Assignment #11, cont’d Because the vector is sorted and random access is possible, use a binary search. The STL map uses an efficient hash function. What about the linked list? Which data structure is best for insertions and deletions? For searching?