CSC 213 – Large Scale Programming
Final Exam Thurs., May 10 from 8:00 – 10:00 in OM 200 Plan on exam taking full 2 hours If major problem, come talk to me ASAP Exam covers material from entire semester Open-book & open-note so bring what you’ve got My handouts, solutions, & computers are not allowed Cannot collaborate with a neighbor on the exam Problems will be in a similar style to 2 midterms Lab mastery: 12:30 – 1:30 on Wed., May 11 in OM119
Lazy
Contemplative
Always Using Imagination
Most Important Trait
Critical Property of Test
Loop Testing: Simple Loops Loop executed at most n times, try inputs that: Skip loop entirely Make 1 pass through the loop Make 2 passes through the loop Make m passes through the loop, where ( m < n ) If possible, If possible, n -1, n, & ( n +1) passes through the loop
Indexed File Format Split information into two (or more) files Data file uses fixed-size records to store data Index files contain search terms & location record starts Fixed-size records usually used in data file Each record will use exactly that much space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier Makes it far easier to reuse space & rebuild index
Entry ADT Needs 2 pieces: what we have & what we want First part is the key: data used in search Item we want is value; the second part of an Entry Implementations must define 2 methods key() & value() return appropriate item NOT Usually includes setValue() but NOT setKey()
What is a M AP ? At simplest level, Map is collection of Entry s key-value pairs serve as the basic data in a Map size() & isEmpty() work at level of Entry Searchable data stored using Map s put() adds an Entry so key is mapped to the value get() retrieves value associated with key from Map remove() deletes entire Entry At most one value per key using a Map
Dictionary ADT very D ICTIONARY ADT very similar to M AP Hold searchable data in each of these ADTs Both data structures are collections of Entry s Convert key to value using either concept D ICTIONARY can have multiple values to one key 1 value for key is still legal option Also many Entry s with same key but different value
Using Hash Properly
Binary Search Trees Implements a BinaryTree for searching Map or Dictionary will be ADT exposed Data organized to make usage efficient (maybe) Strict ordering maintained in tree Nodes to the left are smaller Larger keys in right child of node Equal values not specified No problem, just be consistent
BST Performance Search, insert, & remove take O(h) time h is height of tree Height’s best case is complete tree at O(log n) O(n) height for linked list is BST’s worst case
AVL Tree Definition Fancy type of BST O(log n) time provided For this, needs more info
AVL Tree Definition Node heights are shown in blue
Concept Behind SplayTree Splay trees do NOT maintain balance Recently used nodes clustered near top of BST Most recently accessed nodes take O(1) time Other nodes may need O ( n ) time to access but provides no guarantees Usually very efficient, but provides no guarantees
Red-Black Tree black Root Property: Root node painted black black External Property: Leaves are painted black black Internal Property: Red nodes’ children are black black Depth Property: Leaves have identical black depth black Number of black ancestors for the node
Map & Dictionary ADT ImplementationSearchingAddingRemoving Ordered List O(log n)O(n)O(n)O(n)O(n) Unordered List O(n)O(n)O(n)/O(1)O(n)/O(1)O(n)O(n) Hash O(n)O(n)O(n)O(n)O(n)O(n) if lucky/good O(1)O(1)O(1)O(1)O(1)O(1) BST O(n)O(n)O(n)O(n)O(n)O(n) AVL / balanced O(log n) Splay (expected) O(log n) Splay (worst-case) O(n)O(n)O(n)O(n)O(n)O(n)
Using Set s Amorphous, Iterable collection of elements Elements unordered in concept, but only once per Set Easiest approach relies on ordered Sequence Faster possible, but much more complex & limits use This still fast: hardest operations need O ( n ) time P OSITION s Set list comp
Set Operations Set defines 3 operations involving other Set s union(s) adds elements in s to this Elements in s removed from this by subtract(s) intersect(s) preserve elements in s also in this Comparing Set s’ elements needed for each Operations differ only in how to act after comparison When already ordered, process much easier
Merge Sort Execution Tree Show steps used to sort all of the data 77 72 22 99 94 44 88 83 33 66 61 11 1
Quick Sort Divide: Partition by pivot L has values <= p G uses values >= p Recur: Sort L and G Conquer: Merge L, p, G p p L G p
Quick Sort v. Merge Sort Quick SortMerge Sort Divide data around pivot Want pivot to be near middle All comparisons occur here Conquer with recursion Does not need extra space Merge usually done already Data already sorted! Divide data in blindly half Always gets even split No comparisons performed! Conquer with recursion Needs * to use other arrays Merge combines solutions Compares from (sorted) halves
Bucket & Radix Sort Sort data written as tuple of enumerable data Consumption of wine overall, in liters Annual per capita consumption of liters Sort one place in tuple using bucket sort Uses 1 bucket per value that could be enumerated When there are ties, preserve relative ordering Repeat stable sorts to perform radix sort Must preserve relative ordering, like bucket sort From least to most important sort each tuple place
Radix-Sort In Action List of 4-bit integers sorted using R ADIX - SORT
Lower Bound on Sorting Smallest number of comparisons is tree’s height Decision tree sorting n elements has n! leaves At least log( n !) height needed for this many leaves As we saw, this simplifies to at most O(n log n) height O(n log n) time needed to compare data! Practical lower bound, but cheating can do better Need enumerable tuples Need enumerable tuples - cannot always cheat “If you believe radix hypothesis” it takes O(n) time
Graph Applications Electronic circuits Transportation networks Databases Packing suitcases Finding terrorists Scheduling college’s exams Assigning classes to rooms Garbage collection Coloring countries on a map Playing minesweeper
edges Edge List Structure Simplest Graph Space efficient No change to use with directed or undirected Fields Sequence of vertices Sequence of edges vw ac b a z d b c d vertices v w z u u
edges vertices Adjacency-List Implementation Vertex has Sequence of Edge s Edges still refer to Vertex Ideas in Edge-List serve as base Extends Vertex Add Position reference to speed removal uw uv w ab u v w a b
edges vertices Adjacency Matrix Structure u v w 012 u v w a b ba
012 0 1 2 edges vertices Adjacency Matrix Structure Undirected edges stored in both array locations Directed edges only in array from source to target u v w 012 u v w a b ba
n vertices & m edges no self-loops Edge- List Adjacency- List Adjacency- Matrix Space n m n2n2 incidentEdges (v) mdeg(v)n + deg(v) areAdjacent (v,w) mmin(deg(v), deg(w))1 insertVertex (o) 11n2n2 insertEdge (v,w,o) 111 removeVertex (v) mdeg(v)n2n2 removeEdge (e) 111 Asymptotic Performance
Graphs Solve Many Problems… Understand how it works & what it does: DFS finds connected components in tree form Connected vertices using minimal hops using BFS Dijsktra’s minimizes weight to each vertex Weight of edge total minimized with Prim-Jarnik Topological sort schedules vertices (when possible) Can compute reachablility with Floyd-Warshall Given problem, which algorithm would solve it?
Cost of Accessing Memory How long memory access takes is also important Will make a major difference in time program takes Easy memory aid to remember how this works:
Multi-Way Search Tree Nodes contain multiple elements Tree grows up Tree grows up with leaves always at same level Each internal node: At least 2 children Has 1 fewer Entry s than children Entry s sorted from smallest to largest
Hints for Studying Will NOT require memorizing: ADT’s methods Node implementations Big-Oh time proofs (Anything else you think of)
Hints for Studying
Studying For the Exam 1. What does the ADT/implementation do? Where in the real-world is this found? 2. How is the ADT, search tree, or sort used? What would we apply it to solve a problem? How is it used and why? 3. What is necessary for implementation? Given implementation, why do we do it like that? What tradeoffs does this implementation make?
“Subtle” Hint