CSE 373 Set implementation; intro to hashing

Slides:



Advertisements
Similar presentations
An Introduction to Hashing. By: Sara Kennedy Presented: November 1, 2002.
Advertisements

Hashing as a Dictionary Implementation
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
CSE 143 Lecture 7 Sets and Maps reading: ; 13.2 slides created by Marty Stepp
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Collections Mrs. C. Furman April 21, Collection Classes ArrayList and LinkedList implements List HashSet implements Set TreeSet implements SortedSet.
© 2006 Pearson Addison-Wesley. All rights reserved5 B-1 Chapter 5 (continued) Linked Lists.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CSE 143 Lecture 3 Implementing ArrayIntList reading: slides created by Marty Stepp and Hélène Martin
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Building Java Programs Generics, hashing reading: 18.1.
1 CS162: Introduction to Computer Science II Abstract Data Types.
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
Linked Lists Chapter 5 (continued)
CSCI 210 Data Structures and Algorithms
Efficiency of in Binary Trees
Hashing Exercises.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash Tables.
Week 2: 10/1-10/5 Monday Tuesday Wednesday Thursday Friday
Road Map CS Concepts Data Structures Java Language Java Collections
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Hash Table.
Sets (11.2) set: A collection of unique values (no duplicates allowed) that can perform the following operations efficiently: add, remove, search (contains)
Building Java Programs
Searching Tables Table: sequence of (key,information) pairs
Data Structures and Algorithms
slides created by Marty Stepp and Hélène Martin
CSE 143 Lecture 2 Collections and ArrayIntList
EE 422C Sets.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373: Data Structures and Algorithms
slides adapted from Marty Stepp and Hélène Martin
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
slides created by Marty Stepp and Hélène Martin
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373 Hash set implementation, continued
slides created by Marty Stepp
CSE 143 Lecture 25 Set ADT implementation; hashing read 11.2
Sets (11.2) set: A collection of unique values (no duplicates allowed) that can perform the following operations efficiently: add, remove, search (contains)
CSE 143 Lecture 25 Advanced collection classes
Sets and Maps Chapter 9.
CSE 373 Separate chaining; hash codes; hash maps
CSE 143 Lecture 3 Implementing ArrayIntList reading:
Linked Lists Chapter 5 (continued)
Data Structures & Algorithms
slides created by Marty Stepp
CSE 143 Lecture 4 Implementing ArrayIntList; Binary Search
CSE 373 Priority queue implementation; Intro to heaps
slides created by Marty Stepp
Building Java Programs
Hashing based on slides by Marty Stepp
slides created by Marty Stepp and Hélène Martin
slides adapted from Marty Stepp
TCSS 143, Autumn 2004 Lecture Notes
slides created by Marty Stepp
Linked Lists Chapter 5 (continued)
slides created by Marty Stepp
Implementing ArrayIntList
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
slides created by Marty Stepp and Hélène Martin
Presentation transcript:

CSE 373 Set implementation; intro to hashing read: Weiss 5.1 - 5.2, 5.4, 5.5 slides created by Marty Stepp http://www.cs.washington.edu/373/ © University of Washington, all rights reserved.

Sets set: A collection of unique values (no duplicates allowed) that can perform the following operations efficiently: add, remove, search (contains) The client doesn't think of a set as having indexes; we just add things to the set in general and don't worry about order set.contains("to") true set "the" "of" "from" "to" "she" "you" "him" "why" "in" "down" "by" "if" set.contains("be") false

Int Set ADT interface Let's think about how to write our own implementation of a set. To simplify the problem, we only store ints in our set for now. As is (usually) done in the Java Collection Framework, we will define sets as an ADT by creating a Set interface. Core operations are: add, contains, remove. public interface IntSet { void add(int value); boolean contains(int value); void clear(); boolean isEmpty(); void remove(int value); int size(); }

Unfilled array set Consider storing a set in an unfilled array. It doesn't really matter what order the elements appear in a set, so long as they can be added and searched quickly. What would make a good ordering for the elements? If we store them in the next available index, as in a list, ... set.add(9); set.add(23); set.add(8); set.add(-3); set.add(49); set.add(12); How efficient is add? contains? remove? O(1), O(N), O(N) (contains must loop over the array; remove must shift elements.) index 1 2 3 4 5 6 7 8 9 value 23 -3 49 12 size

Sorted array set Suppose we store the elements in an unfilled array, but in sorted order rather than order of insertion. set.add(9); set.add(23); set.add(8); set.add(-3); set.add(49); set.add(12); How efficient is add? contains? remove? O(N), O(log N), O(N) (You can do an O(log N) binary search to find elements in contains, and to find the proper index in add/remove; but add/remove still need to shift elements right/left to make room, which is O(N) on average.) index 1 2 3 4 5 6 7 8 9 value -3 12 23 49 size

A strange idea Silly idea: When client adds value i, store it at index i in the array. Would this work? Problems / drawbacks of this approach? How to work around them? set.add(7); set.add(1); set.add(9); ... set.add(18); set.add(12); index 1 2 3 4 5 6 7 8 9 value size index 1 2 3 4 5 6 7 8 9 value 12 18 size

Hashing hash: To map a large domain of values to a smaller fixed domain. Typically, mapping a set of elements to integer indexes in an array. Idea: Store any given element value in a particular predictable index. That way, adding / removing / looking for it are constant-time (O(1)). hash table: An array that stores elements via hashing. hash function: An algorithm that maps values to indexes. hash code: The output of a hash function for a given value. In previous slide, our "hash function" was: hash(i)  i Potentially requires a large array (a.length > i). Doesn't work for negative numbers. Array could be very sparse, mostly empty (memory waste).

Improved hash function To deal with negative numbers: hash(i)  abs(i) To deal with large numbers: hash(i)  abs(i) % length set.add(37); // abs(37) % 10 == 7 set.add(-2); // abs(-2) % 10 == 2 set.add(49); // abs(49) % 10 == 9 // inside HashIntSet class private int hash(int i) { return Math.abs(i) % elements.length; } index 1 2 3 4 5 6 7 8 9 value -2 37 49 size

Sketch of implementation public class HashIntSet implements IntSet { private int[] elements; ... public void add(int value) { elements[hash(value)] = value; } public boolean contains(int value) { return elements[hash(value)] == value; public void remove(int value) { elements[hash(value)] = 0; Runtime of add, contains, and remove: O(1) !! Are there any problems with this approach?

Collisions collision: When hash function maps 2 values to same index. set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); // collides with 24! collision resolution: An algorithm for fixing collisions. index 1 2 3 4 5 6 7 8 9 value 11 54 37 49 size

Probing probing: Resolving a collision by moving to another index. linear probing: Moves to the next available index (wraps if needed). set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); // collides with 24; must probe variation: quadratic probing moves increasingly far away: +1, +4, +9, ... index 1 2 3 4 5 6 7 8 9 value 11 24 54 37 49 size

Implementing HashIntSet Let's implement an int set using a hash table with linear probing. For simplicity, assume that the set cannot store 0s for now. public class HashIntSet implements IntSet { private int[] elements; private int size; // constructs new empty set public HashIntSet() { elements = new int[10]; size = 0; } // hash function maps values to indexes private int hash(int value) { return Math.abs(value) % elements.length; ...

The add operation How do we add an element to the hash table? Use the hash function to find the proper bucket index. If we see a 0, put it there. If not, move forward until we find an empty (0) index to store it. If we see that the value is already in the table, don't re-add it. set.add(54); // client code set.add(14); index 1 2 3 4 5 6 7 8 9 value 11 24 54 14 37 49 size

Implementing add How do we add an element to the hash table? index 1 2 public void add(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { // linear probing h = (h + 1) % elements.length; // for empty slot } if (elements[h] != value) { // avoid duplicates elements[h] = value; size++; index 1 2 3 4 5 6 7 8 9 value 11 24 54 37 49 size

The contains operation How do we search for an element in the hash table? Use the hash function to find the proper bucket index. Loop forward until we either find the value, or an empty index (0). If find the value, it is contained (true). If we find 0, it is not (false). set.contains(24) // true set.contains(14) // true set.contains(35) // false index 1 2 3 4 5 6 7 8 9 value 11 24 54 14 37 49 size

Implementing contains public boolean contains(int value) { int h = hash(value); while (elements[h] != 0) { if (elements[h] == value) { // linear probing return true; // to search } h = (h + 1) % elements.length; return false; // not found index 1 2 3 4 5 6 7 8 9 value 11 24 54 37 49 size