CSE 326: Data Structures Lecture #12

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

© 2004 Goodrich, Tamassia Hash Tables1  
Hashing CS 3358 Data Structures.
Maps, Dictionaries, Hashtables
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
CSE 326: Data Structures: Hash Tables
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CSE 326: Data Structures Lecture #11 B-Trees Alon Halevy Spring Quarter 2001.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
CSE 326: Data Structures Hashing Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
CSE 326: Data Structures Lecture #7 Binary Search Trees Alon Halevy Spring Quarter 2001.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hash Table March COP 3502, UCF.
Spring 2015 Lecture 6: Hash Tables
CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
CSE 221: Algorithms and Data Structures Lecture #8 The Constant Struggle for Hash Steve Wolfman 2014W1 1.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
CSE 326: Data Structures Lecture #13 Bart Niswonger Summer Quarter 2001.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
1 CSE 326: Data Structures Part 5 Hashing Henry Kautz Autumn 2002.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
CSE 326 Hashing David Kaplan Dept of Computer Science & Engineering Autumn 2001.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man Steve Wolfman Winter Quarter 2000.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Hashing (part 2) CSE 2011 Winter March 2018.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Dictionaries 9/14/ :35 AM Hash Tables   4
Advanced Associative Structures
CSE 373 Data Structures and Algorithms
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 326: Data Structures Lecture #12 Hashing II
Data Structures and Algorithm Analysis Hashing
CSE 326: Data Structures Lecture #14
CSE 326: Data Structures Lecture #10 Hashing I
Presentation transcript:

CSE 326: Data Structures Lecture #12 Whoa… Good Hash, Man Bart Niswonger Summer Quarter 2001

Today’s Outline Hashing Unix Tutorial Midterm What do you want covered? Midterm Amortized time ADT vs Data Structure Various things today, but in general rather a short day I really want to increase the amount of interactivity in class – I am going to ask you to do small activities, write down ideas/questions, work with your neighbors, etc. Please be efficient – please be willing – please work with different people from time to time – please take it seriously – I am doing this because I think it can help you understand some of the more difficult ideas and also because I think it can keep you more focused on what is going on. I will let you know if any of it will be graded, but I don’t want it to be – I will only do that as a way to make you participate Hashing

Intermediate Unix Tutorial 2 minutes 3 things you love about unix 3 things you hate 5 things you wish you knew how to do 1 gift idea The friendly acm folks are putting on another unix tutorial mainly for your benefit. Since its basically for you guys, they want to know what you want to know!! Take a piece of paper – and in 2 minutes write the above 11 things ALSO!!! I want to give them each a little thank you present for helping you guys out – something from the class, but I need help thinking of things – everyone give me one idea, and then we’ll talk about it Collect ideas and vote?

Asymptotic Time Bounds worst-case running time Over m operations Worst-case for single operation may be really bad, but worst-case for m operations is bounded Lots of people noticed that the SkewHeap::merge method I gave on the midterm was flawed in that it didn’t swap the children Almost no one got the right answer for how adding this line affects the running time! So I ask again – how does that swap change the running time? Work through simple example showing that with out the swap, might as well be merging a binary tree, worst case is O(n), O(mn) for m operations With the swap- since stuff moves back and forth-it is better than that. The question is how much - and through some math which I wont go into, it can be shown to be O(m log n) IN THE WORST CASE even though single operations can be bad

ADT vs Data Structure Abstract Data Type Data structures Abstract Operations & semantics Data-less One No notion of running time or complexity Data structures Concrete implementation Set of algorithms a Holds data Many Very particular running times and complexities Some distinctions between ADTs and DSs Abstract (no implementation) – concrete ( implementation) Operations and what they mean at a high level – implementation via a collection of algorithms Adts know nothing about how the data is stored, just that there is a particular concept of information (data & key) – ds store data explicitly and support “concept of information” One adt has many dss Adts have no implementation, no algorithms, hence no complexity – dss have algorithms which have complexities Tensions that might cause you to choose one ds over another? time vs. space performance vs. elegance generality vs. simplicity one operation’s performance vs. another’s

Dictionary ADT Review Dictionary operations create destroy insert find delete Stores values associated with user-specified keys values may be any (homogenous) type keys may be any (homogenous) comparable type kim chi spicy cabbage Krispy Kreme tasty doughnut kiwi Australian fruit kale leafy green Krispix breakfast cereal insert kohlrabi - upscale tuber find(kiwi) kiwi - Australian fruit Dictionaries associate some key with a value, just like a real dictionary (where the key is a word and the value is its definition). In this example, I’ve stored foods associated with short descriptions. This is probably the most valuable and widely used ADT we’ll hit. What are some of the implementations we have seen and the complexity of their operations? (30 seconds)

Hash Table Approach Kiwi Kumquat f(x) Kim chi Kale Kohlrabi But… is there a problem in this pipe-dream?

Hash Table Dictionary Data Structure Hash function: maps keys to integers result: can quickly find the right spot for a given entry Unordered and sparse table result: cannot efficiently list all entries, Cannot find min and max efficiently, Cannot find all items within a specified range efficiently. Kiwi f(x) Kumquat F(x) Kim chi Kale Kohlrabi A binary search tree is a binary tree in which all nodes in the left subtree of a node have lower values than the node. All nodes in the right subtree of a node have higher value than the node. It’s like making that recursion into the data structure! I’m storing integers at each node. Does everybody think that’s what I’m _really_ going to store? What do I need to know about what I store? (comparison, equality testing)

Hash Table Terminology hash function table Kumquat f(x) Kiwi Kim chi collision Kale Kohlrabi load factor  = # of entries in table tableSize keys

Hash Table Code (First Pass) Value & find(Key & key) { int index = hash(key) % tableSize; return Table[index]; } What should the hash function be? (for integers) What should the table size be? How should we resolve collisions? Take 2 minutes – work with a neighbor – answer these questions No looking at the notes!!! Lets hear some ideas!! -- how do I hash integers? Everyone else – hold those thoughts

A Good Hash Function… is easy (fast) to compute (O(1) and practically fast). distributes the data evenly (hash(a)  hash(b)) uses the whole hash table (for all 0  k < size, there’s an i such that hash(i) % size = k).

A Good Hash Function for Integers Choose tableSize is prime hash(n) = n % tableSize Example: tableSize = 7 insert(4) insert(17) find(12) insert(9) delete(17) 1 2 3 Here is an example of a way to hash integers How’d you do? Now take 3 minutes with the same person and come up with a way to hash strings – again, no looking ahead!! 4 5 6

Good Hash Function for Strings? I want to be able to: insert(“kale”) insert(“Krispy Kreme”) insert(“kim chi”)

Good Hash Function for Strings? Sum the ASCII values of the characters. Consider only the first 3 characters. Uses only 2871 out of 17,576 entries in the table on English words. Let s = s1s2s3s4…s5: choose hash(s) = s1 + s2128 + s31282 + s41283 + … + sn128n Problems: hash(“really, really big”) = well… something really, really big hash(“one thing”) % 128 = hash(“other thing”) % 128 Think of the string as a base 128 number.

Easy to Compute String Hash Use Horner’s Rule int hash(String s) { h = 0; for (i = s.length() - 1; i >= 0; i--) { h = (si + 128*h) % tableSize; } return h; What is horner’s rule???

Universal Hashing For any fixed hash function, there will be some pathological sets of inputs everything hashes to the same cell! Solution: Universal Hashing Start with a large (parameterized) class of hash functions No sequence of inputs is bad for all of them! When your program starts up, pick one of the hash functions to use at random (for the entire time) Now: no bad inputs, only unlucky choices! If universal class large, odds of making a bad choice very low If you do find you are in trouble, just pick a different hash function and re-hash the previous inputs

“Random” Vector Universal Hash Parameterized by prime size and vector: a = <a0 a1 … ar> where 0 <= ai < size Represent each key as r + 1 integers where ki < size size = 11, key = 39752 ==> <3,9,7,5,2> size = 29, key = “hello world” ==> <8,5,12,12,15,23,15,18,12,4> ha(k) = R is the range – the number of things you could feed into the hash function Case one – range is integers 0 – 9 Case two – range is letters 1 - 26 dot product with a “random” vector!

Universal Hash Function Strengths: works on any type as long as you can form ki’s if we’re building a static table, we can try many a’s a random a has guaranteed good properties no matter what we’re hashing Weaknesses must choose prime table size larger than any ki

Hash Function Summary Goals of a hash function Example Hash functions reproducible mapping from key to table entry evenly distribute keys across the table separate commonly occurring keys (neighboring keys?) complete quickly Example Hash functions h(n) = n % size h(n) = string as base 128 number % size One Universal hash function: dot product with random vector There are many, many hash functions – some give you one guarantee, others give you another- these are some simple ones, I encourage you to come up with your own, to look on the web for more, talk to me, etc.

How to Design a Hash Function Know what your keys are Study how your keys are distributed Try to include all important information in a key in the construction of its hash Try to make “neighboring” keys hash to very different places Prune the features used to create the hash until it runs “fast enough” (very application dependent)

Collisions Pigeonhole principle says we can’t avoid all collisions try to hash without collision m keys into n slots with m > n try to put 6 pigeons into 5 holes What do we do when two keys hash to the same entry? open hashing: put little dictionaries in each entry closed hashing: pick a next entry to try The pigeonhole principle is a vitally important mathematical principle that asks what happens when you try to shove k+1 pigeons into k pigeon sized holes. Don’t snicker. But, the fact is that no hash function can perfectly hash m keys into fewer than m slots. They won’t fit. What do we do? 1) Shove the pigeons in anyway. 2) Try somewhere else when we’re shoving two pigeons in the same place. Does closed hashing solve the original problem?no – sometimes quite badly – like run out of space after using ½ of buckets! shove extra pigeons in one hole!

To Do Project II Homework 4 Read Chapter 5 (fast!)

Coming Up More hashing Cool stuff! Project III