Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Slides:



Advertisements
Similar presentations
© 2004 Goodrich, Tamassia Hash Tables
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
CS252: Systems Programming Ninghui Li Program Interview Questions.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Stacks. 2 Outline and Reading The Stack ADT (§4.2.1) Applications of Stacks (§4.2.3) Array-based implementation (§4.2.2) Growable array-based stack.
© 2004 Goodrich, Tamassia Hash Tables1  
SETS, HASH TABLES, AND DICTIONARIES CS16: Introduction to Data Structures & Algorithms Tuesday February 10,
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
IMPROVING PSEUDOCODE CS16: Introduction to Data Structures & Algorithms Thursday, February 26,
Dictionaries and Hash Tables1  
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Stacks. 2 Outline and Reading The Stack ADT (§2.1.1) Applications of Stacks (§2.1.1) Array-based implementation (§2.1.1) Growable array-based stack (§1.5)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
EXPANDING STACKS AND QUEUES CS16: Introduction to Data Structures & Algorithms 1 Tuesday, February 10, 2015.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Week 3 – Wednesday.  What did we talk about last time?  ADTs  List implementation with a dynamic array.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing CS 110: Data Structures and Algorithms First Semester,
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hash Tables Part II: Using Buckets
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2002 Bin Sort, Radix.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2001 Bin Sort, Radix.
Hash Tables Buckets/Chaining
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Dictionaries and Hash Tables
Presentation transcript:

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Hashing so far To store 250 IP addresses in table: Pick prime just bigger than 250 (n = 257) Pick a 1, …, a 4 mod 257 (once and for all) To hash x = (x 1, …, x 4 ): –Compute u = a 1 x 1 + … + a 4 x 4 mod 257 –Store x in a bucket at myArray[u]

Generalization Old: To store 250 IP addresses in table New: store n 1 items, each between 0 and N

Generalization To store store n 1 items between 0 and N Pick prime n just bigger than n 1 Let k = round_up(log n N) –Each “item” can be written as a k-digit number, base n Pick a 1, …, a k mod n (once and for all) To hash x = (x 1, …, x k ): –Compute u = a 1 x 1 + … + a k x k mod n –Store x in a bucket at myArray[u]

Example Store 8 items, each represented by 16 bits (i.e., between 0 and 2 16 – 1 = 65535) Solution: pick p = 11. Log = 4.625…, so we pick k = 5 Pick 5 numbers a 1, …, a 5, mod 11: 3,10, 0, 5, 2

Example (cont.) Multipliers: 3, 10, 0, 5, 2 Typical “key”: Convert to base 11: –Mod(31905, 11) = 5 –Div(31905, 11) = 2900 –Mod(2900, 11) = 7 –Div (2900, 11) = 263 … – = 21A75 [“A” means “10”] Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.

In practice Usually items aren’t given as integers between 0 and some large number N Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters

Practice, cont’d Assume objects have k-byte identifiers x Compute u = a 1 x 1 + … + a k x k mod n Put (x, object) into hashbucket u This works as long as n > 256 = byte size Otherwise assumption of unif. distributed hash indexes is wrong

The SET Abstract Data Type create (n): creates a new empty set structure, initially empty but capable of holding up to n elements. empty (S): checks whether the set S is empty. size (S): returns the number of elements in S. element_of (x,S): checks whether the value x is in the set S. enumerate (S): yields the elements of S in some arbitrary order. add (S,x): adds the element x to S, if it is not there already. delete (S,x): removes the element x from S, if it is there.

Implementing sets Can use hashtable: –“create”, “empty”, and “size” are trivial –“enumerate”: take all elements in all buckets –“add” is just “insert”; “delete” is “delete” –is_element is just “find”

DICTIONARY ADT Create, empty, size as in SET Still to do: –Insert(key, value) –Find(key) Sometimes called “store” and “fetch” A dictionary is sometimes called a “map” –“key” is ‘mapped to’ “value” Closely related to a “database” May allow several values for one key –Find(key) returns a list of values in this case

Implementing a dictionary Create(n) –Build an array of prime size a little more than n, each entry an empty list –Pick k numbers, mod n, to handle keys of length k

Insert(key, value) –Let u = (a 1 key 1 + … + a k key k ) mod n –Insert (key, value) into array[u] Find(key) –Let u = (a 1 key 1 + … + a k key k ) mod n –Search for (key, *) in array[u] –If you find (key, val), return val –Else return None (Modify as appropriate to return list of vals)

Summary We can now assume that we can create a SET or a DICT with O(n  1) insertion and lookup times whenever we need one After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance

Example Application: JUMBLE!

JUMBLE Input: list of all 5-letter words in English Each word represented as an array of five characters Output: all words for which no other permutation is a word

Solution Start with an empty dictionary Foreach word w –Sort letters alphabetically to get wnew –D.insert(wnew, w) Foreach word w –Sort alphabetically again to get wnew D(wnew) contains anything except w –Skip w Else output w

Clean Your Code Errors per line ~ constant –Fewer errors overall! Easier to grade –More likely to get credit Cleaner code = cleaner thinking –Better understanding of material

LCA(u, v) lca = null udepth = T.depth(u) vdepth = T.depth(v) if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if udepth > vdepth then u = T.parent(u) udepth = udepth – 1 else if vdepth > udepth v = T.parent(v) vdepth = vdepth – 1 else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v) lca = null udepth = T.depth(u) vdepth = T.depth(v) if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if udepth > vdepth then u = T.parent(u) udepth = udepth – 1 else if vdepth > udepth v = T.parent(v) vdepth = vdepth – 1 else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null udepth = T.depth(u) vdepth = T.depth(v) if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if udepth > vdepth then u = T.parent(u) udepth = udepth – 1 else if vdepth > udepth v = T.parent(v) vdepth = vdepth – 1 else u = T.parent(u) v = T.parent(v) return lca Needlessly complex

LCA(u, v, T) lca = null udepth = T.depth(u) vdepth = T.depth(v) if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca Now irrelevant

LCA(u, v, T) lca = null if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca Redundant

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca it’s the answer; return it!

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root return lca while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root return lca while (lca = null) do if (u = v) then lca = u return lca else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca Condition is irrelevant

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root return lca repeat if (u = v) then lca = u return lca else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) lca is no longer used!

LCA(u, v, T) if T.isroot(u) or T.isroot(v) then return T.root repeat if (u = v) then return u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v)

LCA(u, v, T) if T.isroot(u) or T.isroot(v) then return T.root repeat if (u = v) then return u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or T.isroot(v) then return T.root repeat if (u = v) then return u else u = T.parent(u) v = T.parent(v)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or T.isroot(v) or (u = v) then return u repeat [OOPS!] else u = T.parent(u) v = T.parent(v)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or T.isroot(v) or (u = v) then return u else return LCA(T.parent(u), T.parent(v), T)

Not needed LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or (u = v) then return u else return LCA(T.parent(u), T.parent(v), T)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if (u = v) then return u else return LCA(T.parent(u), T.parent(v), T) Called during recursion, but no effect

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) return LCAsimple(T.parent(u), T.parent(v), T) LCAsimple(u, v, T) # LCA for case where u and v have same height if (u = v) return u else return LCAsimple(T.parent(u), T.parent(v), T)

DONE!

STACK Stack operations: –Push, pop, size, isEmpty() (Partial) Implementation: –Array-based stack

ArrayStack INIT: data = array[20] Count = 0; // next empty space Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

ArrayStack pop(): if count == 0 ERROR(“Can’t pop from empty Stack”) else count--; return data[count+1];

ArrayStack size(): return count isEmpty() return count == 0

Analysis

ArrayStack INIT: data = array[20] Count = 0; // next empty space Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”) O(n  1)

ArrayStack pop(): if count == 0 ERROR(“Can’t pop from empty Stack”) else count--; return data[count+1]; O(n  1)

ArrayStack size(): return count isEmpty() return count == 0 O(n  1)

Summary Fast but not very useful

ExpandableArrayStack INIT: data = array[20] Count = 0; // next empty space Capacity = 20

Push Push(obj o): if count < capacity data[count] = o count++ else d2 = new Array[capacity+1] for j = 0 to capacity d2[j] = data[j] capacity = capacity + 1 data = d2 push(o)

Expandable Array Stack All other operations remain the same

Analysis In the worst case, the time taken is O(n  n) If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work …+ (20+k) = (20+1) + (20+2) + …(20+k) = 20k + (1+2+…+k) = 20k + k(k+1)/2 = O(k  k^2) So average time is O(k  k) as well!

Better: avoid frequent expansion Instead of adding a little space, add a lot! Double array size when it gets full

DoublingArrayStack: Push Push(obj o): if count < capacity data[count] = o count++ else d2 = new Array[2*capacity] for j = 0 to capacity d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

Doubling Array Stack All other operations remain the same

Analysis Push(obj o): if count < capacity data[count] = o count++ else d2 = new Array[2*capacity] for j = 0 to capacity d2[j] = data[j] capacity = 2*capacity data = d2 push(o) O(n  1) O(n  n)

Analysis In the worst case, the time taken is O(n  n) But over the course of many operations, average time per operation is O(n  1)

“Total Work Analysis” If we have an array with n elements …and do n operations …then total work is no more than 4n. Work per operation, on average, is 4.

Alternative view “Amortized” analysis: –For each operation that takes one unit of time Place an extra unit of time “in the bank” –By the time an expensive operation arrives Use your savings to pay for it Alternative view: –When you do an expensive operation Pay one unit now Pay an extra unit for each of the next n operations

Language For hashing: “the ‘find’ operation runs in expected O(n  1) time” For doubling array stacks: “the ‘push’ operation runs in O(n  1) amortized time, with O(n  n) worst-case time.”

Pixel boundaries (if time)