Cse 373 April 24th – Hashing.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Maps, Dictionaries, Hashtables
CSE 250: Data Structures Week 12 March 31 – April 4, 2008.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
COSC 2007 Data Structures II
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Nov 22, 2010IAT 2651 Java Collections. Nov 22, 2010IAT 2652 Data Structures  With a collection of data, we often want to do many things –Organize –Iterate.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CSC 172 DATA STRUCTURES. MIDTERM REVIEW MIDTERM EXAM 75 min 6 – 10 questions a lot like the quiz questions.
CE 221 Data Structures and Algorithms
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Hashing.
Hashtables.
May 3rd – Hashing & Graphs
COMP 53 – Week Eleven Hashtables.
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Lecture No.43 Data Structures Dr. Sohail Aslam.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Review Graph Directed Graph Undirected Graph Sub-Graph
Cse 373 April 12th – TreEs.
Cse 373 April 26th – Exam Review.
November 1st – Exam Review
Efficiency add remove find unsorted array O(1) O(n) sorted array
Quadratic probing Double hashing Removal and open addressing Chaining
Data Structures and Database Applications Hashing and Hashtables in C#
Hash table another data structure for implementing a map or a set
Hash Tables banana apple banana apple cantaloupe kiwi
Design and Analysis of Algorithms
Hash Table.
Hash Table.
Chapter 28 Hashing.
Data Structures and Database Applications Hashing and Hashtables in C#
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Hashing CS2110.
Data Structures and Algorithms
Chapter 21 Hashing: Implementing Dictionaries and Sets
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
Hash Tables and Associative Containers
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
ITEC 2620M Introduction to Data Structures
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CSE 326: Data Structures Lecture #12 Hashing II
slides created by Marty Stepp
Hashing.
Hash Maps Introduction
Data Structures and Algorithm Analysis Hashing
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
CSE 326: Data Structures Lecture #14
Hash Tables banana apple banana apple cantaloupe kiwi
Lecture-Hashing.
Presentation transcript:

Cse 373 April 24th – Hashing

Exam Friday Practice exam after class today Topics: Stacks and Queues BigO Notation and runtime Analysis Heaps Trees (BST and AVL) Traversals Design Tradeoffs

Exam Friday Format No note sheet One section of short answer 4-5 Technical Questions 1 Design Decision Question Less than 10 minutes per problem

Exam Friday No Java material on the exam Looking for theoretical understanding Explanations are important (where indicated) If you get stuck on a problem, move on Any questions?

Today’s lecture Hashing Basic Concept Hash functions Collision Resolution Runtimes

Hashing Introduction Suppose there is a set of data M Any data we might want to store is a member of this set. For example, M might be the set of all strings There is a set of data that we actually care about storing D, where D << M For an English Dictionary, D might be the set of English words

Hashing What is our ideal data structure? The data structure should use O(D) memory No extra memory is allocated The operation should run in O(1) time Accesses should be as fast as possible

Hashing What are some difficulties with this? Need to know the size of D in advance or lose memory to pointer overhead Hard to go from M -> D in O(1) time

Hashing Memory: The Hash Table Consider an array of size c * D Each index in the array corresponds to some element in M that we want to store. The data in D does not need any particular ordering.

The Hash Table How can we do this? M D

The Hash Table How can we do this? Unsorted Array M Apple D

The Hash Table How can we do this? Unsorted Array M Apple Pear D

The Hash Table How can we do this? Unsorted Array Apple Pear Orange M

The Hash Table How can we do this? Unsorted Array Apple Pear Orange M Durian D

The Hash Table How can we do this? Unsorted Array Apple Pear Orange M Durian D Kumquat

The Hash Table What is the problem here? Takes O(D) time to find the word in the list Same problem with sorted arrays! M Apple Pear Orange Durian D Kumquat

The Hash Table What is another solution? Random mapping Kumquat Pear M Durian Apple Orange

The Hash Table What’s the problem here? Can’t retrieve the random variable, O(D) search! M Kumquat Pear D Durian Apple Orange

The Hash Table What about a pseudo-random mapping? This is “the hash function” M Kumquat Pear D h(x) Durian Apple Orange

hash Functions The Hash Function maps the large space M to our target space D. We want our hash function to do the following: Be repeatable: H(x) = H(x) every run Be equally distributed: For all y,z in D, P(H(y)) = P(H(z)) Run in constant time: H(x) = O(1)

hash Example Let’s consider an example. We want to save 10 numbers from all possible Java ints What is a simple hash function? ints 1 2 h(x) = key%10 3 4 5 6 7 8 9

hash Example Let’s insert(519) table Where does it go? 519%10 = 1 2 ints 1 2 h(x) = key%10 3 4 5 6 7 8 9

hash Example Let’s insert(519) table Where does it go? 519%10 = 9 1 2 1 2 h(x) = key%10 519 3 4 5 6 7 8 9: 519

hash Example Insert(204) 204 1 2 h(x) = key%10 519 3 4 5 6 7 8 9: 519

hash Example Insert(204) 204 1 2 h(x) = key%10 519 3 4: 214 5 6 7 8 204 1 2 h(x) = key%10 519 3 4: 214 5 6 7 8 9: 519

hash Example insert(1001) 204 1 2 h(x) = key%10 519 3 4: 214 1001 5 6 204 1 2 h(x) = key%10 519 3 4: 214 1001 5 6 7 8 9: 519

hash Example insert(1001) 204 1: 1001 2 h(x) = key%10 519 3 4: 214 204 1: 1001 2 h(x) = key%10 519 3 4: 214 1001 5 6 7 8 9: 519

hash Example Is there a problem here? 204 1: 1001 2 h(x) = key%10 519 204 1: 1001 2 h(x) = key%10 519 3 4: 214 1001 5 6 7 8 9: 519

hash Example Is there a problem here? insert(3744) 204 1: 1001 2 204 1: 1001 2 h(x) = key%10 519 3 4: 214 1001 5 6 7 3744 8 9: 519

hash Example Is there a problem here? insert(3744) 204 1: 1001 2 204 1: 1001 2 h(x) = key%10 519 3 4: 214 1001 5 6 7 3744 8 9: 519

hash Example Is there a problem here? insert(3744) This is called a collision! 204 1: 1001 2 h(x) = key%10 519 3 4: 214 1001 5 6 7 3744 8 9: 519

hash Example How to rectify collisions? Think of a strategy for a few minutes Possible solutions: Store in the next available space Store both in the same space Try a different hash Resize the array

Linear Probing Consider the simplest solution Find the next available spot in the array This solution is called linear probing 204 1: 1001 2 h(x) = key%10 519 3 4: 204 1001 5: 3744 6 7 3744 8 9: 519

Linear Probing What are the problems with this? How do we search for 3744? Need to go to 4, and then cycle through all of the entries until we find the element or find a blank space What if we need to add something that ends in 5? It also ends up in this problem area This is called clustering

Clustering What are the negative effects of clustering? If the cluster becomes too large, two things happen: The chances of colliding with the cluster increase The time it takes to find something in the cluster increases. This isn’t O(1) time!

Clustering How can we solve this problem? Resize the array Give the elements more space to avoid clusters. How long does this take? O(n)! all of the elements need to be rehashed. Store multiple items in one location This is called chaining We’ll discuss it after the midterm

hash Tables Take-aways for the midterm Hashtables should provide O(1) dictionary operations Collisions make this problem difficult to achieve Hashtables rely on a array and a hash function The array should be relative to the size of the data you want to keep The hash function should run in constant time and should distribute among the indices in the target array Linear probing is a solution for collisions, but only works when there is lots of free space Resizing is very costly

Next Class Hash Tables Examples, examples, examples No new theory Exam review