Searching, Maps,Tries (hashing)

Slides:



Advertisements
Similar presentations
Hashing as a Dictionary Implementation
Advertisements

Hash Tables and Associative Containers CS-212 Dick Steflik.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
CompSci Binary Trees l Linked lists: efficient insertion/deletion, inefficient search  ArrayList: search can be efficient, insertion/deletion.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
CompSci 100e 6.1 Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a.
CPS Searching, Maps,Tries (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
Hashing as a Dictionary Implementation Chapter 19.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CPS Searching, Maps, Tables l Searching is a fundamentally important operation ä We want to do these operations quickly ä Consider searching using.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
CSC 213 – Large Scale Programming. Today’s Goal  Review when, where, & why we use Map s  Why Sequence -based approach causes problems  How hash can.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
Instructor: Lilian de Greef Quarter: Summer 2017
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
CE 221 Data Structures and Algorithms
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Searching, Maps,Tries (hashing)
Hashing.
Hash table CSC317 We have elements with key and satellite data
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
Week 8 - Wednesday CS221.
Hashing Alexandra Stefan.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Advanced Associative Structures
Chapter 28 Hashing.
Instructor: Lilian de Greef Quarter: Summer 2017
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables.
Data Structures and Algorithms
Chapter 21 Hashing: Implementing Dictionaries and Sets
Collision Resolution Neil Tang 02/18/2010
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables and Associative Containers
CSE 373 Data Structures and Algorithms
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
CSE 143 Lecture 25 Set ADT implementation; hashing read 11.2
Sets and Maps Chapter 9.
Collision Resolution Neil Tang 02/21/2008
Ch Hash Tables Array or linked list Binary search trees
slides created by Marty Stepp
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing.
Data Structures and Algorithm Analysis Hashing
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Searching, Maps,Tries (hashing) Searching is a fundamentally important operation We want to search quickly, very very quickly Consider searching using Google, ACES, issues? In general we want to search in a collection for a key We've searched using trees and arrays Tree implementation was quick: O(log n) worst/average? Arrays: access is O(1), search is slower If we compare keys, log n is best for searching n elements Lower bound is W(log n), provable Hashing is O(1) on average, not a contradiction, why? Tries are O(1) worst-case!! (ignoring length of key)

From Google to Maps If we wanted to write a search engine we’d need to access lots of pages and keep lots of data Given a word, on what pages does it appear? This is a map of words->web pages In general a map associates a key with a value Look up the key in the map, get the value Google: key is word/words, value is list of web pages Anagram: key is string, value is words that are anagrams Interface issues Lookup a key, return boolean: in map or value: associated with the key (what if key not in map?) Insert a key/value pair into the map

Interface at work: MapDemo.java Key is a string, Value is # occurrences Interface in code below shows how Map class works while (scanner.hasNext()) { String s = (String) scanner.next(); Counter c = (Counter) map.get(s); if (c != null) c.increment(); else map.put(s, new Counter()); } What clues are there for prototype of map.get and map.put? What if a key is not in map, what value returned? What kind of objects can be put in a map?

Accessing all values in a map (e.g., print) Access every key in the map, then get the corresponding value Get an iterator of the set of keys: keySet().iterator() For each key returned by this iterator call map.get(key) … Get an iterator over (key,value) pairs, there's a nested class called Map.Entry that the iterator returns, accessing the key and the value separately is then possible To see all the pairs use entrySet().iterator()

External Iterator without generics The Iterator interface accesses elements Source of iterator makes a difference: cast required? Iterator it = map.keySet().iterator(); while (it.hasHasNext()){ Object value = map.get(it.next()); } Iterator it2 = map.entrySet().iterator(); while (it2.hasNext()){ Map.Entry me = (Map.Entry) it.next(); Object value = me.getValue();

External Iterator with generics Avoid Object, we know what we have a map of Is the syntax worth it? Iterator<String> it = map.keySet().iterator(); while (it.hasHasNext()){ Object value = map.get(it.next()); } Iterator<Map.Entry<String,Counter>> it2 = map.entrySet().iterator(); while (it2.hasNext()){ Map.Entry<String,Counter> me = it.next(); Object value = me.getValue();

Hashing: Log (10100) is a big number Comparison based searches are too slow for lots of data How many comparisons needed for a billion elements? What if one billion web-pages indexed? Hashing is a search method: average case O(1) search Worst case is very bad, but in practice hashing is good Associate a number with every key, use the number to store the key Like catalog in library, given book title, find the book A hash function generates the number from the key Goal: Efficient to calculate Goal: Distributes keys evenly in hash table

Hashing details 0 1 2 3 n-1 There will be collisions, two keys will hash to the same value We must handle collisions, still have efficient search What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? Several ways to handle collisions, in general array/vector used Linear probing, look in next spot if not found Hash to index h, try h+1, h+2, …, wrap at end Clustering problems, deletion problems, growing problems Quadratic probing Hash to index h, try h+12, h+22 , h+32 , …, wrap at end Fewer clustering problems Double hashing Hash to index h, with another hash function to j Try h, h+j, h+2j, …

Chaining with hashing With n buckets each bucket stores linked list Compute hash value h, look up key in linked list table[h] Hopefully linked lists are short, searching is fast Unsuccessful searches often faster than successful Empty linked lists searched more quickly than non-empty Potential problems? Hash table details Size of hash table should be a prime number Keep load factor small: number of keys/size of table On average, with reasonable load factor, search is O(1) What if load factor gets too high? Rehash or other method

Hashing problems Linear probing, hash(x) = x, (mod tablesize) Insert 24, 12, 45, 14, delete 24, insert 23 (where?) Same numbers, use quadratic probing (clustering better?) What about chaining, what happens? 0 1 2 3 4 5 6 7 8 9 10 12 24 45 14 0 1 2 3 4 5 6 7 8 9 10 12 24 14 45

What about hash functions Hashing often done on strings, consider two alternatives public static int hash(String s) { int k, total = 0; for(k=0; k < s.length(); k++){ total += s.charAt(k); } return total; Consider total += (k+1)*s.charAt(k), why might this be better? Other functions used, always mod result by table size What about hashing other objects? Need conversion of key to index, not always simple Every object contains hashCode()!

Trie: efficient search words/suffixes A trie (from retrieval, but pronounced “try”) supports Insertion: put string into trie (delete and look up) These operations are O(size of string) regardless of how many strings are stored in the trie! Guaranteed! In some ways a trie is like a 128 (or 26 or alphabet-size) tree, one branch/edge for each character/letter Node stores branches to other nodes Node stores whether it ends the string from root to it Extremely useful in DNA/string processing Very useful for matching suffixes: suffix tree

Trie picture and code (see Trie.java) To add string Start at root, for each char create node as needed, go down tree, mark last node To find string Start at root, follow links If null, not found Check word flag at end To print all nodes Visit every node, build string as nodes traversed What about union and intersection? a c p r n s a r a h s t c d a o Indicates word ends here