CPS 100 7.1 Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Big Java Chapter 16.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
CompSci 100e 6.1 Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a.
Software Design 1.1 Tapestry classes -> STL l What’s the difference between tvector and vector  Safety and the kitchen sink What happens with t[21] on.
Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.
CPS Searching, Maps,Tries (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
CSC 427: Data Structures and Algorithm Analysis
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
CPS From sets, toward maps, via big-Oh l Consider the MemberCheck weekly problem  Find elements in common to two vectors  Alphabetize/sort resulting.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CPS Searching, Maps, Tables l Searching is a fundamentally important operation ä We want to do these operations quickly ä Consider searching using.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Duke CPS Faster and faster and … search l Binary search trees ä average case insert/search/delete = O( ) ä worst case = O( ) l balanced search.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Searching, Maps,Tries (hashing)
Searching, Maps,Tries (hashing)
Hash table CSC317 We have elements with key and satellite data
Efficiency add remove find unsorted array O(1) O(n) sorted array
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables and Associative Containers
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Sets and Maps Chapter 9.
Ch Hash Tables Array or linked list Binary search trees
Collision Handling Collisions occur when different elements are mapped to the same cell.
Presentation transcript:

CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider searching using google.com, ACES, issues?  In general we want to search in a collection for a key Recall search in readsettree.cpp, readsetlist2.cpp  Tree implementation was quick  Vector of linked lists was fast, but how to make it faster? l If we compare keys, we cannot do better than log n to search n elements  Lower bound is  (log n), provable  Hashing is O(1) on average, not a contradiction, why?

CPS From Google to Maps l If we wanted to write a search engine we’d need to access lots of pages and keep lots of data  Given a word, on what pages does it appear?  This is a map of words->web pages l In general a map associates a key with a value  Look up the key in the map, get the value  Google: key is word/words, value is list of web pages  Anagram: key is string, value is words that are anagrams l Interface issues  Lookup a key, return boolean: in map or value: associated with the key (what if key not in map?)  Insert a key/value pair into the map

CPS Interface at work: tmapcounter.cpp l Key is a string, Value is # occurrences  Interface in code below shows how tmap class works while (input >> word) { if (map->contains(word)) { map->get(word) += 1; } else { map->insert(word,1); } What clues are there for prototype of map.get and map.contains ?  Reference is returned by get, not a copy, why?  Parameters to contains, get, insert are same type, what?

CPS Accessing values in a map (e.g., print) l We can apply a function object to every element in a map, this is called an internal iterator  Simple to implement (why?), relatively easy to use See Printer class in tmapcounter.cpp  Limited: must visit every map element (can’t stop early) Alternative: use Iterator subclass (see tmapcounter.cpp ), this is called an external iterator  Iterator has access to “guts” of a map, iterates over it Must be a friend-class to access guts Tightly coupled: container and iterator  Standard interface of Init, HasMore, Next, Current  Can have several iterators at once, can stop early, can pass iterators around as parameters/objects

CPS Internal iterator (applyAll/applyOne) l Applicant subclass: applied to key/value pairs stored in a map  The applicant has an applyOne function, called from the map/collection, in turn, with each key/value pair  The map/collection has an applyAll function to which is passed an instance of a subclass of Applicant class Printer : public Applicant { public: virtual void applyOne(string& key, int& value) { cout << value << "\t" << key << endl; } }; l Applicant class is templated on the type of key and value  See tmap.h, tmapcounter.cpp, and other examples

CPS From interface to implementation l First the name: STL uses map, Java uses map, we’ll use map  Other books/courses use table, dictionary, symbol table  We’ve seen part of the map interface in tmapcounter.cpp What other functions might be useful? What’s actually stored internally in a map? l The class tmap is a templated, abstract base class  Advantage of templated class (e.g., tvector, tstack, tqueue)  Base class permits different implementations UVmap, BSTVap, HMap (stores just string->value)  Internally combine key/value into a pair is part of STL, standard template library Struct with two fields: first and second

CPS External Iterator l The Iterator base class is templated on pair, makes for ugly declaration of iterator pointer  (note: space between > > in code below is required why?) Iterator > * it = map->makeIterator(); for(it->Init(); it->HasMore(); it->Next()) { cout Current().second << “\t”; cout Current().first << endl; } l We ask a map/container to provide us with an iterator  We don't know how the map is implemented, just want an iterator  Map object is an iterator factory: makes/creates iterator

CPS Tapestry tmap v STL map See comparable code in tmapcounterstl.cpp  Instead of get, use overloaded [] operator  Instead of contains use count --- returns an int Instead of Iterator class with Init, HasMore, …  Use begin() and end() for starting and ending values  Use ++ to increment iterator [compare with Next() ]  Instead of Current(), dereference the iterator l STL map uses a balanced search tree, guaranteed O(log n)  Nonstandard hash_map is tricky to use in general  We’ll see one way to do balanced trees later

CPS Map example: finding anagrams l mapanagram.cpp, alternative program for finding anagrams  Maps string (normalized): key to tvector : value  Look up normalized string, associate all "equal" strings with normalized form  To print, loop over all keys, grab vector, print if ??? l Each value in the map is list/collection of anagrams  How do we look up this value?  How do we create initial list to store (first time)  We actually store pointer to vector rather than vector Avoid map->get()[k], can't copy vector returned by get l See also mapanastl.cpp for standard C++ using STL  The STL code is very similar to tapestry (and to Java!)

CPS Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a billion elements?  What if one billion web-pages indexed? l Hashing is a search method that has average case O(1) search  Worst case is very bad, but in practice hashing is good  Associate a number with every key, use the number to store the key Like catalog in library, given book title, find the book l A hash function generates the number from the key  Goal: Efficient to calculate  Goal: Distributes keys evenly in hash table

CPS Hashing details l There will be collisions, two keys will hash to the same value  We must handle collisions, still have efficient search  What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? l Several ways to handle collisions, in general array/vector used  Linear probing, look in next spot if not found Hash to index h, try h+1, h+2, …, wrap at end Clustering problems, deletion problems, growing problems  Quadratic probing Hash to index h, try h+1 2, h+2 2, h+3 2, …, wrap at end Fewer clustering problems  Double hashing Hash to index h, with another hash function to j Try h, h+j, h+2j, … n-1

CPS Chaining with hashing l With n buckets each bucket stores linked list  Compute hash value h, look up key in linked list table[h]  Hopefully linked lists are short, searching is fast  Unsuccessful searches often faster than successful Empty linked lists searched more quickly than non-empty  Potential problems? l Hash table details  Size of hash table should be a prime number  Keep load factor small: number of keys/size of table  On average, with reasonable load factor, search is O(1)  What if load factor gets too high? Rehash or other method

CPS Hashing problems l Linear probing, hash(x) = x, (mod tablesize)  Insert 24, 12, 45, 14, delete 24, insert 23 (where?) l Same numbers, use quadratic probing (clustering better?) l What about chaining, what happens?

CPS What about hash functions l Hashing often done on strings, consider two alternatives unsigned hash(const string& s) { unsigned int k, total = 0; for(k=0; k < s.length(); k++){ total += s[k]; } return total; } Consider total += (k+1)*s[k], why might this be better?  Other functions used, always mod result by table size l What about hashing other objects?  Need conversion of key to index, not always simple  HMap (subclass of tmap) maps string->values  Why not any key type (only strings)?

CPS Why use inheritance? l We want to program to an interface (an abstraction, a concept)  The interface may be concretely implemented in different ways, consider stream hierarchy void readStuff(istream& input){…} // call function ifstream input("data.txt"); readStuff(input); readStuff(cin);  What about new kinds of streams, ok to use? l Open/closed principle of code development  Code should be open to extension, closed to modification  Why is this (usually) a good idea?

CPS Nancy Leveson: Software Safety Founded the field l Mathematical and engineering aspects  Air traffic control  Microsoft word "C++ is not state-of-the-art, it's only state-of-the-practice, which in recent years has been going backwards" l Software and steam engines: once extremely dangerous? l l THERAC 25: Radiation machine that killed many people l