Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis

Hashing so far To store 250 IP addresses in table: Pick prime just bigger than 250 (n = 257) Pick a 1, …, a 4 mod 257 (once and for all) To hash x = (x 1, …, x 4 ): –Compute u = a 1 x 1 + … + a 4 x 4 mod 257 –Store x in a bucket at myArray[u]

Generalization Old: To store 250 IP addresses in table New: store n 1 items, each between 0 and N

Generalization To store store n 1 items between 0 and N Pick prime n just bigger than n 1 Let k = round_up(log n N) –Each “item” can be written as a k-digit number, base n Pick a 1, …, a k mod n (once and for all) To hash x = (x 1, …, x k ): –Compute u = a 1 x 1 + … + a k x k mod n –Store x in a bucket at myArray[u]

Example Store 8 items, each represented by 16 bits (i.e., between 0 and 2 16 – 1 = 65535) Solution: pick p = 11. Log 11 65535 = 4.625…, so we pick k = 5 Pick 5 numbers a 1, …, a 5, mod 11: 3,10, 0, 5, 2

Example (cont.) Multipliers: 3, 10, 0, 5, 2 Typical “key”: 31905. Convert to base 11: –Mod(31905, 11) = 5 –Div(31905, 11) = 2900 –Mod(2900, 11) = 7 –Div (2900, 11) = 263 … –31905 11 = 21A75 [“A” means “10”] Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.

In practice Usually items aren’t given as integers between 0 and some large number N Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters

Practice, cont’d Assume objects have k-byte identifiers x Compute u = a 1 x 1 + … + a k x k mod n Put (x, object) into hashbucket u This works as long as n > 256 = byte size Otherwise assumption of unif. distributed hash indexes is wrong

The SET Abstract Data Type create (n): creates a new empty set structure, initially empty but capable of holding up to n elements. empty (S): checks whether the set S is empty. size (S): returns the number of elements in S. element_of (x,S): checks whether the value x is in the set S. enumerate (S): yields the elements of S in some arbitrary order. add (S,x): adds the element x to S, if it is not there already. delete (S,x): removes the element x from S, if it is there.

Implementing sets Can use hashtable: –“create”, “empty”, and “size” are trivial –“enumerate”: take all elements in all buckets –“add” is just “insert”; “delete” is “delete” –is_element is just “find”

DICTIONARY ADT Create, empty, size as in SET Still to do: –Insert(key, value) –Find(key) Sometimes called “store” and “fetch” A dictionary is sometimes called a “map” –“key” is ‘mapped to’ “value” Closely related to a “database” May allow several values for one key –Find(key) returns a list of values in this case

Implementing a dictionary Create(n) –Build an array of prime size a little more than n, each entry an empty list –Pick k numbers, mod n, to handle keys of length k

Insert(key, value) –Let u = (a 1 key 1 + … + a k key k ) mod n –Insert (key, value) into array[u] Find(key) –Let u = (a 1 key 1 + … + a k key k ) mod n –Search for (key, *) in array[u] –If you find (key, val), return val –Else return None (Modify as appropriate to return list of vals)

Summary We can now assume that we can create a SET or a DICT with O(n  1) insertion and lookup times whenever we need one After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance

Example Application: JUMBLE!

JUMBLE Input: list of all 5-letter words in English Each word represented as an array of five characters Output: all words for which no other permutation is a word

Solution Start with an empty dictionary Foreach word w –Sort letters alphabetically to get wnew –D.insert(wnew, w) Foreach word w –Sort alphabetically again to get wnew D(wnew) contains anything except w –Skip w Else output w

Clean Your Code Errors per line ~ constant –Fewer errors overall! Easier to grade –More likely to get credit Cleaner code = cleaner thinking –Better understanding of material

LCA(u, v) lca = null udepth = T.depth(u) vdepth = T.depth(v) if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if udepth > vdepth then u = T.parent(u) udepth = udepth – 1 else if vdepth > udepth v = T.parent(v) vdepth = vdepth – 1 else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null udepth = T.depth(u) vdepth = T.depth(v) if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if udepth > vdepth then u = T.parent(u) udepth = udepth – 1 else if vdepth > udepth v = T.parent(v) vdepth = vdepth – 1 else u = T.parent(u) v = T.parent(v) return lca Needlessly complex

LCA(u, v, T) lca = null udepth = T.depth(u) vdepth = T.depth(v) if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca Now irrelevant

LCA(u, v, T) lca = null if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null if (T.isroot(u) = true) or (T.isroot(v) = true) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca Redundant

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca it’s the answer; return it!

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root return lca while (lca = null) do if (u = v) then lca = u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root return lca while (lca = null) do if (u = v) then lca = u return lca else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) return lca Condition is irrelevant

LCA(u, v, T) lca = null if T.isroot(u) or T.isroot(v) then lca = T.root return lca repeat if (u = v) then lca = u return lca else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v) lca is no longer used!

LCA(u, v, T) if T.isroot(u) or T.isroot(v) then return T.root repeat if (u = v) then return u else if T.depth(u) > T.depth(v) then u = T.parent(u) else if T.depth(v) > T.depth(u) v = T.parent(v) else u = T.parent(u) v = T.parent(v)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or T.isroot(v) then return T.root repeat if (u = v) then return u else u = T.parent(u) v = T.parent(v)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or T.isroot(v) or (u = v) then return u repeat [OOPS!] else u = T.parent(u) v = T.parent(v)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or T.isroot(v) or (u = v) then return u else return LCA(T.parent(u), T.parent(v), T)

Not needed LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if T.isroot(u) or (u = v) then return u else return LCA(T.parent(u), T.parent(v), T)

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) if (u = v) then return u else return LCA(T.parent(u), T.parent(v), T) Called during recursion, but no effect

LCA(u, v, T) while T.depth(u) > T.depth(v) u = T.parent(u) while T.depth(v) > T.depth(u) v = T.parent(v) return LCAsimple(T.parent(u), T.parent(v), T) LCAsimple(u, v, T) # LCA for case where u and v have same height if (u = v) return u else return LCAsimple(T.parent(u), T.parent(v), T)

STACK Stack operations: –Push, pop, size, isEmpty() (Partial) Implementation: –Array-based stack

ArrayStack INIT: data = array[20] Count = 0; // next empty space ------------------------------------------------------------- Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

ArrayStack pop(): if count == 0 ERROR(“Can’t pop from empty Stack”) else count--; return data[count+1];

ArrayStack size(): return count isEmpty() return count == 0

Analysis

ArrayStack INIT: data = array[20] Count = 0; // next empty space ------------------------------------------------------------- Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”) O(n  1)

ArrayStack pop(): if count == 0 ERROR(“Can’t pop from empty Stack”) else count--; return data[count+1]; O(n  1)

ArrayStack size(): return count isEmpty() return count == 0 O(n  1)

Summary Fast but not very useful

ExpandableArrayStack INIT: data = array[20] Count = 0; // next empty space Capacity = 20

Push Push(obj o): if count < capacity data[count] = o count++ else d2 = new Array[capacity+1] for j = 0 to capacity d2[j] = data[j] capacity = capacity + 1 data = d2 push(o)

Expandable Array Stack All other operations remain the same

Analysis In the worst case, the time taken is O(n  n) If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work 21+22+…+ (20+k) = (20+1) + (20+2) + …(20+k) = 20k + (1+2+…+k) = 20k + k(k+1)/2 = O(k  k^2) So average time is O(k  k) as well!

Better: avoid frequent expansion Instead of adding a little space, add a lot! Double array size when it gets full

DoublingArrayStack: Push Push(obj o): if count < capacity data[count] = o count++ else d2 = new Array[2*capacity] for j = 0 to capacity d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

Doubling Array Stack All other operations remain the same

Analysis Push(obj o): if count < capacity data[count] = o count++ else d2 = new Array[2*capacity] for j = 0 to capacity d2[j] = data[j] capacity = 2*capacity data = d2 push(o) O(n  1) O(n  n)

Analysis In the worst case, the time taken is O(n  n) But over the course of many operations, average time per operation is O(n  1)

“Total Work Analysis” If we have an array with n elements …and do n operations …then total work is no more than 4n. Work per operation, on average, is 4.

Alternative view “Amortized” analysis: –For each operation that takes one unit of time Place an extra unit of time “in the bank” –By the time an expensive operation arrives Use your savings to pay for it Alternative view: –When you do an expensive operation Pay one unit now Pay an extra unit for each of the next n operations

Language For hashing: “the ‘find’ operation runs in expected O(n  1) time” For doubling array stacks: “the ‘push’ operation runs in O(n  1) amortized time, with O(n  n) worst-case time.”

Pixel boundaries (if time)

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Similar presentations

Presentation on theme: "Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Similar presentations

Presentation on theme: "Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis."— Presentation transcript:

Similar presentations

About project

Feedback