CPS 100 5.1 Searching, Maps, Tables l Searching is a fundamentally important operation ä We want to do these operations quickly ä Consider searching using.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Hashing.
An Introduction to Hashing. By: Sara Kennedy Presented: November 1, 2002.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
COSC 2007 Data Structures II
Standard Template Library C++ introduced both object-oriented ideas, as well as templates to C Templates are ways to write general code around objects.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Data structures and algorithms in the collection framework 1 Part 2.
Big Java Chapter 16.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
CompSci 100e 6.1 Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a.
Software Design 1.1 Tapestry classes -> STL l What’s the difference between tvector and vector  Safety and the kitchen sink What happens with t[21] on.
Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.
CPS Searching, Maps,Tries (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing as a Dictionary Implementation Chapter 19.
CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions Kevin Quinn Fall 2015.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
1 Designing Hash Tables Sections 5.3, 5.4, 5.5, 5.6.
Duke CPS Faster and faster and … search l Binary search trees ä average case insert/search/delete = O( ) ä worst case = O( ) l balanced search.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Building Java Programs Generics, hashing reading: 18.1.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
Searching, Maps,Tries (hashing)
Searching, Maps,Tries (hashing)
Hash table CSC317 We have elements with key and satellite data
Efficiency add remove find unsorted array O(1) O(n) sorted array
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CS202 - Fundamental Structures of Computer Science II
slides created by Marty Stepp
Collision Handling Collisions occur when different elements are mapped to the same cell.
Presentation transcript:

CPS Searching, Maps, Tables l Searching is a fundamentally important operation ä We want to do these operations quickly ä Consider searching using google.com, altavista.com, etc., ä In general we want to search in a collection for a key We’ve seen searching in context of the MultiSet class ä Tree implementation was quick ä Table implementation wasn’t bad, how to make it better? l If we compare keys, we cannot do better than log n to search n elements  Lower bound is  (log n), provable ä Hashing is O(1) on average, not a contradiction, why?

CPS From Google to Maps l If we wanted to write a search engine we’d need to access lots of pages and keep lots of data ä Given a word, on what pages does it appear? ä This is a map of words->web pages l In general a map associates a key with a value ä Look up the key in the map, get the value ä Google: key is word/words, value is list of web pages ä Multiset: key is string, value is # occurrences of string l Interface issues ä Lookup a key, return boolean: in map or value: associated with the key (what if key not in map?) ä Insert a key/value pair into the map

CPS Interface at work: tmapcounter.cpp l Key is a string, Value is # occurrences (like multiset)  Interface in code below shows how tmap class works while (input >> word) { if (map->contains(word)) { map->get(word) += 1; } else { map->insert(word,1); } What clues are there for prototype of map.get and map.contains ?  Reference is returned by get, not a copy, why?  Parameters to contains, get, insert are same type, what?

CPS Accessing values in a map l We can apply a function object to every element in a map ä Similar to MultiSet ä Simple to implement, relatively easy to use ä Limited: must visit every map element (can’t stop early) Alternative: use an iterator (see tmapcounter.cpp ) ä Iterator has access to “guts” of a map, iterates over it ä Standard interface of Init, HasMore, Next, Current ä Can have several iterators at once, can stop early Iterator * it = map->makeIterator(); for(it->Init(); it->HasMore(); it->Next()) { cout Current().second << “\t”; cout Current().first << endl; }

CPS Other map examples l Anamap.cpp, alternative program for finding anagrams ä Maps Anaword: key to tvector : value ä Look up Anaword, associate all equal Anawords with first one stored in map ä To print, loop over all keys, grab vector, print if ??? l Parsing arithmetic expressions ä Inheritance hierarchy and somewhat complex code ä Map string/variable name: key to Expression *: value Map x -> y + 3, what’s value of x when y = 7 ? What happens if we map x -> y and y -> x ?

CPS From interface to implementation l First the name: STL uses map, Java uses map, we’ll use map ä Other books/courses use table, dictionary, symbol table ä We’ve seen part of the map interface in tmapcounter.cpp What other functions might be useful? What’s actually stored internally in a map? l The class tmap is a templated, abstract base class ä Advantage of templated class (e.g., tvector, tstack, tqueue) ä Base class permits different implementations UVmap, BSTVap, HMap (stores just string->value) ä Internally combine key/value into a pair is part of STL, standard template library Struct with two fields: first and second

CPS Using templated classes l Client code includes (typically) only.h file ä Where is the.cpp file, why not access via #include? ä Difference between compilation and linking ä Is foo.h included in foo.cpp? Why? l Template.cpp file is NOT code, it’s a code generator/template ä When template is instantiated by client, code is generated ä To instantiate, need access to template source ä Templated foo.h typically has #include “foo.cpp” Why is this better in foo.h than in client program? l If you don’t call a templated function it’s not generated ä Template instantiation creates code, but not every member function (not created if not called)

CPS Log (google) is a big number l Comparison based searches are too slow for lots of data ä How many comparisons needed for a billion elements? ä What if one billion web-pages indexed? l Hashing is a search method that has average case O(1) search ä Worst case is very bad, but in practice hashing is good ä Associate a number with every key, use the number to store the key Like catalog in library, given book title, find the book l A hash function generates the number from the key ä Goal: Efficient to calculate ä Goal: Distributes keys evenly in hash table

CPS Hashing details l There will be collisions, two keys will hash to the same value ä We must handle collisions, still have efficient search ä What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? l Several ways to handle collisions, in general array/vector used ä Linear probing, look in next spot if not found Hash to index h, try h+1, h+2, …, wrap at end Clustering problems, deletion problems, growing problems ä Quadratic probing Has to index h, try h+1 2, h+2 2, h+3 2, …, wrap at end Fewer clustering problems ä Double hashing Hash to index h, with another hash function to j Try h, h+j, h+2j, … n-1

CPS Chaining with hashing l With n buckets each bucket stores linked list ä Compute hash value h, look up key in linked list table[h] ä Hopefully linked lists are short, searching is fast ä Unsuccessful searches often faster than successful Empty linked lists searched more quickly than non-empty ä Potential problems? l Hash table details ä Size of hash table should be a prime number ä Keep load factor small: number of keys/size of table ä On average, with reasonable load factor, search is O(1) ä What if load factor gets too high? Rehash or other method

CPS Hashing problems l Linear probing, hash(x) = x, (mod tablesize) ä Insert 24, 12, 45, 14, delete 24, insert 23 l Same numbers, use quadratic probing (clustering better?) l What about chaining, what happens?

CPS What about hash functions l Hashing often done on strings, consider two alternatives unsigned hash(const string& s) { unsigned int k, total = 0; for(k=0; k < s.length(); k++) { total += s[k]; } return total; } What about total += k*s[k], why might this be better? ä Other functions used, always mod result by table size l What about hashing other objects? ä Need conversion of key to index, not always simple ä HMap (subclass of tmap) maps string->values ä Why not any key type (only strings)?

CPS Why use inheritance? l We want to program to an interface (an abstraction, a concept) ä The interface may be concretely implemented in different ways, consider stream hierarchy void readStuff(istream& input){…} // call function ifstream input("data.txt"); readStuff(input); readStuff(cin); ä What about new kinds of streams, ok to use? l Open/closed principle of code development ä Code should be open to extension, closed to modification ä Why is this (usually) a good idea?

CPS Two examples l Consider the expression example (expression.h/.cpp) ä What do we need to do to add a Multiplication class? ä What code must be modified vs. extended? l Consider the RSG assignment ä Object-oriented solution is arguably much simpler  Everything is a GrammarElement that can be parsed and expanded Terminal NonTerminal Definition Production