Briana B. Morrison Adapted from William Collins

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

1 A B C
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Simplifications of Context-Free Grammars
AP STUDY SESSION 2.
1
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
STATISTICS HYPOTHESES TEST (I)
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Media-Monitoring Final Report April - May 2010 News.
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
Turing Machines.
PP Test Review Sections 6-1 to 6-6
Chapter 10: Applications of Arrays and the class vector
© 2004 Goodrich, Tamassia Hash Tables
1 Hash Tables Saurav Karmakar. 2 Motivation What are the dictionary operations? What are the dictionary operations? (1) Insert (1) Insert (2) Delete (2)
Hash Tables.
Dictionaries 4/1/2017 2:36 PM Hash Tables  …
2000 Deitel & Associates, Inc. All rights reserved. Chapter 16 – Bits, Characters, Strings, and Structures Outline 16.1Introduction 16.2Structure Definitions.
Hash Tables Dr. Li Jiang School of Computer Science,
Regression with Panel Data
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
Chapter 1: Expressions, Equations, & Inequalities
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.
Artificial Intelligence
When you see… Find the zeros You think….
Before Between After.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
Subtraction: Adding UP
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Types of selection structures
Static Equilibrium; Elasticity and Fracture
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
CSE20 Lecture 15 Karnaugh Maps Professor CK Cheng CSE Dept. UC San Diego 1.
Clock will move after 1 minute
famous photographer Ara Guler famous photographer ARA GULER.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Hash Tables1 Part E Hash Tables  
Presentation transcript:

Briana B. Morrison Adapted from William Collins Hash Tables Briana B. Morrison Adapted from William Collins

Hashing

Hashing

Sequential Search Given a vector of integers: What is the best case for sequential search? O(1) when value is the first element What is the worst case? O(n) when value is last element, or value is not in the list What is the average case? O(1/2 * n) which is O(n) Hashing

Hashing

Hashing

Binary Search Given a vector of integers: What is the best case for binary search? O(1) when element is the middle element What is the worst case? O(log n) when element is first, last, or not in list What is the average case? O(log n) Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Map vs. Hashmap What are the differences between a map and a hashmap? Interface Efficiency Applications Implementation Hashing

Hashing

Hashing

array? vector? deque? heap? LINKED Linked? list? map? CONTIGUOUS array? vector? deque? heap? LINKED Linked? list? map? BUT NONE OF THESE WILL GIVE CONSTANT AVERAGE TIME FOR SEARCHES, INSERTIONS AND REMOVALS. Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

To make these values fit into the table, we need to mod by the table size; i.e., key % 1000. 210 256 816 OOPS! Hashing

Hashing

Hashing

Hashing

Hash Codes Suppose we have a table of size N A hash code is: A number in the range 0 to N-1 We compute the hash code from the key You can think of this as a “default position” when inserting, or a “position hint” when looking up A hash function is a way of computing a hash code Desire: The set of keys should spread evenly over the N values When two keys have the same hash code: collision Hashing

Hash Functions A hash function should be quick and easy to compute. A hash function should achieve an even distribution of the keys that actually occur across the range of indices for both random and non-random data. Calculation should involve the entire search key. Hashing

Examples of Hash Functions Usually involves taking the key, chopping it up, mix the pieces together in various ways Examples: Truncation – ignore part of key, use the remaining part as the index Folding – partition the key into several parts and combine the parts in a convenient way (adding, etc.) After calculating the index, use modular arithmetic. Divide by the size of the index range, and take the remainder as the result Hashing

Example Hash Function Hashing

Devising Hash Functions Simple functions often produce many collisions ... but complex functions may not be good either! It is often an empirical process Adding letter values in a string: same hash for strings with same letters in different order Better approach: size_t hash = 0; for (size_t i = 0; i < s.size(); ++i) hash = hash * 31 + s[i]; Hashing

Devising Hash Functions (2) The String hash is good in that: Every letter affects the value The order of the letters affects the value The values tend to be spread well over the integers Hashing

Devising Hash Functions (3) Guidelines for good hash functions: Spread values evenly: as if “random” Cheap to compute Generally, number of possible values much greater than table size Hashing

Hash Code Maps Memory address: Integer cast: Component sum: We reinterpret the memory address of the key object as an integer Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., char, short, int and float on many machines) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double on many machines) Hashing

Hash Code Maps (cont.) Polynomial accumulation: We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a0 a1 … an-1 We evaluate the polynomial p(z) = a0 + a1 z + a2 z2 + … … + an-1zn-1 at a fixed value z, ignoring overflows Especially suitable for strings (e.g., the choice z = 33 gives at most 6 collisions on a set of 50,000 English words) Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: The following polynomials are successively computed, each from the previous one in O(1) time p0(z) = an-1 pi (z) = an-i-1 + zpi-1(z) (i = 1, 2, …, n -1) We have p(z) = pn-1(z) Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Collision Handlers NOW WE’LL LOOK AT SPECIFIC COLLISION HANDLERS: Chaining Linear Probing (Open Addressing) Double Hashing Quadratic Hashing Hashing

Collision Handling Collisions occur when different elements are mapped to the same cell Chaining: let each cell in the table point to a linked list of elements that map there  1 2 3 4 451-229-0004 981-101-0004 025-612-0001 Chaining is simple, but requires additional memory outside the table Hashing

Hashing

Hashing

Chaining with Separate Lists Example Hashing

Chaining Picture Two items hashed to bucket 3 Three items hashed to bucket 4 Hashing

Hashing

Hashing

averageTimeS(n, m)  n / 2m iterations. <= 0.75 / 2 FOR THE find METHOD, averageTimeS(n, m)  n / 2m iterations. <= 0.75 / 2 SO averageTimeS(n, m) <= A CONSTANT. averageTimeS(n, m) IS CONSTANT. Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hash Table Using Open Probe Addressing Example Insert 45 (mod by table size … % 11) Hashing

Hash Table Using Open Probe Addressing Example Insert 35 Hashing

Hash Table Using Open Probe Addressing Example Insert 76 Hashing

Hash Table Using Open Probe Addressing Example Hashing

Linear Probing Open addressing: the colliding item is placed in a different cell of the table Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a “probe” Colliding items lump together, causing future collisions to cause a longer sequence of probes Example: h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 1 2 3 4 5 6 7 8 9 10 11 12 Hashing

WE NEED TO KNOW WHEN A SLOT IS FULL OR OCCUPIED. HOW? INSTEAD OF JUST T() STORED IN THE BUCKETS (BECAUSE T() COULD BE A VALID VALUE), THE BUCKET WILL STORE AN INSTANCE OF THE VALUE_TYPE CLASS. Hashing

Hashing

Hashing

Retrieve What about when we want to retrieve? Consider the previous example…. Hashing

Hash Table Using Open Probe Addressing Example Find the value 35. (% 11) Now find the value 76. Now find the value 33. Hashing

Hash Table Using Open Probe Addressing Example Now delete 35. (% 11) Now find the value 76. Now find the value 33. Hashing

Linear Probing Probe by incrementing the index If “fall off end”, wrap around to the beginning Take care not to cycle forever! Compute index as hash_fcn() % table.size() if table[index] == NULL, item is not in the table if table[index] matches item, found item (done) Increment index circularly and go to 2 Why must we probe repeatedly? hashCode may produce collisions remainder by table.size may produce collisions Hashing

Search Termination Ways to obtain proper termination Stop when you come back to your starting point Stop after probing N slots, where N is table size Stop when you reach the bottom the second time Ensure table never full Reallocate when occupancy exceeds threshold Hashing

Hashing

Erase value 1069. false Hashing

Now search for 460. Hashing

Hashing

insert SETS marked_for_removal TO false; SOLUTION: bool marked_for_removal; THE CONSTRUCTOR FOR VALUE_TYPE SETS EACH bucket’s marked_for_removal FIELD TO false. insert SETS marked_for_removal TO false; erase SETS marked_for_removal TO true. SO AFTER THE INSERTIONS: Hashing

Hashing

Hashing

Hashing

Hashing

CLUSTER: A SEQUENCE OF NON-EMPTY LOCATIONS KEYS THAT HASH TO 54 FOLLOW THE SAME COLLISION-PATH AS KEYS THAT HASH TO 55, … Hashing

Hashing

Hashing

SOLUTION 1: DOUBLE HASHING, THAT IS, OBTAIN BOTH INDICES AND OFFSETS BY HASHING:   unsigned long hash_int = hash (key); int index = hash_int % length, offset = hash_int / length; NOW THE OFFSET DEPENDS ON THE KEY, SO DIFFERENT KEYS WILL USUALLY HAVE DIFFERENT OFFSETS, SO NO MORE PRIMARY CLUSTERING! Secondary hash function Hashing

TO GET A NEW INDEX: index = (index + offset) % length; Notice that if a collision occurs, you rehash from the NEW index value. Hashing

WHERE WOULD THESE KEYS GO IN buckets? EXAMPLE: length = 11 key index offset 15 4 1 19 8 1 16 5 1 58 3 5 27 5 2 35 2 3 30 8 2 47 3 4 WHERE WOULD THESE KEYS GO IN buckets? 1 2 3 4 5 6 7 8 9 10 Hashing

index key 0 47 1 2 35 3 58 4 15 5 16 6 7 27 8 19 9 10 30 Hashing

PROBLEM: WHAT IF OFFSET IS A MULTIPLE OF length? EXAMPLE: length = 11 key index offset 15 4 1 19 8 1 16 5 1 58 3 5 27 5 2 35 2 3 47 3 4 246 4 22 // BUT 15 IS AT INDEX 4 // FOR KEY 246, NEW INDEX = (4 + 22) % 11 = 4. OOPS! 47 1 2 35 3 58 4 15 5 16 6 7 27 8 19 9 10 Hashing

ON AVERAGE, offset % length WILL EQUAL 0 ONLY ONCE IN EVERY SOLUTION: if (offset % length == 0) offset = 1; ON AVERAGE, offset % length WILL EQUAL 0 ONLY ONCE IN EVERY length TIMES. Hashing

FINAL PROBLEM: WHAT IF length HAS SEVERAL FACTORS? EXAMPLE: length = 20 key index offset 20 0 1 25 5 1 30 10 1 35 15 1 110 10 5 // BUT 30 IS AT INDEX 10 FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15, WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5) % 20, WHICH IS OCCUPIED, SO NEW INDEX = ... Hashing

SOLUTION: MAKE length A PRIME. Hashing

Example of Double Hashing Consider a hash table storing integer keys that handles collision with double hashing N = 13 h(k) = k mod 13 d(k) = 7 - k mod 7 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 1 2 3 4 5 6 7 8 9 10 11 12 31 41 18 32 59 73 22 44 1 2 3 4 5 6 7 8 9 10 11 12 Hashing

Hashing

Notice that h stays at the same location. No clustering. ANOTHER SOLUTION: QUADRATIC HASHING, THAT IS, ONCE COLLISION OCCURS AT h, GO TO LOCATION h + 1, THEN IF COLLISION OCCURS THERE GO TO LOCATION h + 4, then h + 9, then h + 16, etc. unsigned long hash_int = hash (key); int index = hash_int % length, offset = i2; Notice that h stays at the same location. No clustering. Hashing

QUADRATIC REHASHING EXAMPLE: length = 11 key index offset 15 4 19 8 15 4 19 8 16 5 58 3 27 5 1, final place index = 6 35 2 30 8 1, final place index = 9 47 3 4, final place index = 7 1 2 3 58 4 15 5 16 6 7 8 19 9 10 Hashing

Performance HOW DOES DOUBLE-HASHING COMPARE WITH CHAINED HASHING?

Performance of Hash Tables Load factor = # filled cells / table size Between 0 and 1 Load factor has greatest effect on performance Lower load factor  better performance Reduce collisions in sparsely populated tables Knuth gives expected # probes p for open addressing, linear probing, load factor L: p = ½(1 + 1/(1-L)) As L approaches 1, this zooms up For chaining, p = 1 + (L/2) Note: Here L can be greater than 1! Hashing

Performance of Hash Tables (2) Hashing

Performance of Hash Tables (3) Insert: average O(1) Search: average O(1) Sorted array: Insert: average O(n) Search: average O(log n) Binary Search Tree: Insert: average O(log n) But balanced trees can guarantee O(log n) Hashing

We know that hashing becomes inefficient as the table fills up We know that hashing becomes inefficient as the table fills up. What to do? EXPAND! Hashing

Hashing

Hashing

Hashing

Key for Hashing In Class Work Open Addr 0 1 2 3 4 5 6 7 8 9 10 11 12 26 9 7 3 0 1 20 33 22 49 140 38 Double (Re-hashing) 0 1 2 3 4 5 6 7 8 9 10 11 12 26 9 0 3 140 1 20 7 33 49 22 38 Chaining 0 1 2 3 4 5 6 7 8 9 10 11 12 26, 0 3 20, 33, 7 22, 9 49, 140 38 Hashing

§- Hash Table Summary Slide 1 - simulates the fastest searching technique, knowing the index of the required value in a vector and array and apply the index to access the value, by applying a hash function that converts the data to an integer - After obtaining an index by dividing the value from the hash function by the table size and taking the remainder, access the table. Normally, the number of elements in the table is much smaller than the number of distinct data values, so collisions occur. - To handle collisions, we must place a value that collides with an existing table element into the table in such a way that we can efficiently access it later. Hashing 121

§- Hash Table (Cont…) Summary Slide 2 - average running time for a search of a hash table is O(1) - the worst case is O(n) Hashing 122

§- Collision Resolution Summary Slide 3 §- Collision Resolution - Types: 1) linear open probe addressing - the table is a vector or array of static size - After using the hash function to compute a table index, look up the entry in the table. - If the values match, perform an update if necessary. - If the table entry is empty, insert the value in the table. Hashing 123

§- Collision Resolution (Cont…) Summary Slide 4 §- Collision Resolution (Cont…) - Types: 1) linear open probe addressing - Otherwise, probe forward circularly, looking for a match or an empty table slot. - If the probe returns to the original starting point, the table is full. - you can search table items that hashed to different table locations. - Deleting an item difficult. Hashing 124

§- Collision Resolution (Cont…) Summary Slide 5 §- Collision Resolution (Cont…) 2) chaining with separate lists. - the hash table is a vector of list objects - Each list is a sequence of colliding items. - After applying the hash function to compute the table index, search the list for the data value. - If it is found, update its value; otherwise, insert the value at the back of the list. - you search only items that collided at the same table location Hashing 125

§- Collision Resolution (Cont…) Summary Slide 6 §- Collision Resolution (Cont…) - there is no limitation on the number of values in the table, and deleting an item from the table involves only erasing it from its corresponding list Hashing 126