CSE 326: Data Structures Lecture #23 Data Structures

Slides:



Advertisements
Similar presentations
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Advertisements

Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.
More on Randomized Data Structures. Motivation for Randomized Data Structures We’ve seen many data structures with good average case performance on random.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
CSE 326 Randomized Data Structures David Kaplan Dept of Computer Science & Engineering Autumn 2001.
Data Structures, Spring 2004 © L. Joskowicz 1 DAST – Final Lecture Summary and overview What we have learned. Why it is important. What next.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
1 CSE 326: Data Structures Part 10 Advanced Data Structures Henry Kautz Autumn Quarter 2002.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Review – Part 1 CSE 2011 Winter September 2015.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Final Review Dr. Yingwu Zhu. Goals Use appropriate data structures to solve real- world problems –E.g., use stack to implement non-recursive BST traversal,
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
Review for Final Exam Non-cumulative, covers material since exam 2 Data structures covered: –Treaps –Hashing –Disjoint sets –Graphs For each of these data.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
CSE 340: Review (at last!) Measuring The Complexity Complexity is a function of the size of the input O() Ω() Θ() Complexity Analysis “same order” Order.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
CSE 589 Applied Algorithms Spring 1999 Prim’s Algorithm for MST Load Balance Spanning Tree Hamiltonian Path.
1 Priority Queues (Heaps). 2 Priority Queues Many applications require that we process records with keys in order, but not necessarily in full sorted.
Final Exam Review COP4530.
Final Exam Review CS 3358.
CSE332: Data Abstractions Lecture 7: AVL Trees
CSCE 210 Data Structures and Algorithms
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Graphs Chapter 20.
Lecture 15 Nov 3, 2013 Height-balanced BST Recall:
CSE 326: Data Structures: Advanced Topics
BackTracking CS255.
Chapter 5 : Trees.
Shortest Paths and Minimum Spanning Trees
Recitation 10 Prelim Review.
CSE373: Data Structures & Algorithms
October 30th – Priority QUeues
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Week 11 - Friday CS221.
Hashing Exercises.
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Cse 373 April 26th – Exam Review.
Review for Final Exam Non-cumulative, covers material since exam 2
Review for Final Exam Non-cumulative, covers material since exam 2
Review for Final Exam Non-cumulative, covers material since exam 2
Review for Midterm Neil Tang 03/04/2010
CSE 326: Data Structures: Midterm Review
CMSC 341: Data Structures Priority Queues – Binary Heaps
i206: Lecture 14: Heaps, Graphs intro.
Final Exam Review COP4530.
David Kaplan Dept of Computer Science & Engineering Autumn 2001
CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act
Lectures on Graph Algorithms: searching, testing and sorting
8/04/2009 Many thanks to David Sun for some of the included slides!
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
Chapter 6: Transform and Conquer
CSE 332: Data Abstractions AVL Trees
Advanced Implementation of Tables
CSE 326: Data Structures Lecture #18 Fistful of Data Structures
Binary and Binomial Heaps
Final Review Dr. Yingwu Zhu.
CSC 380: Design and Analysis of Algorithms
Review for Final Exam Non-cumulative, covers material since exam 2
CSE 326: Data Structures Lecture #9 AVL II
Priority Queues Supports the following operations. Insert element x.
Important Problem Types and Fundamental Data Structures
Spanning Trees Lecture 20 CS2110 – Spring 2015.
CSE 326: Data Structures Lecture #24 Odds ‘n Ends
Heaps & Multi-way Search Trees
326 Lecture 9 Henry Kautz Winter Quarter 2002
More on Randomized Data Structures
Presentation transcript:

CSE 326: Data Structures Lecture #23 Data Structures Randomized Henry Kautz Winter Quarter 2002 I’m going to blow through quite a few “advanced” data structures today. What should you take away from these? You should know what ADT these implement. You should have a general idea of why you might use one. Given a description of how one works, you should be able to reason intelligently about them.

Warning! The Queen of Spades is a very unlucky card! Pick a Card Warning! The Queen of Spades is a very unlucky card!

Randomized Data Structures We’ve seen many data structures with good average case performance on random inputs, but bad behavior on particular inputs Binary Search Trees Instead of randomizing the input (since we cannot!), consider randomizing the data structure No bad inputs, just unlucky random numbers Expected case good behavior on any input

What’s the Difference? Deterministic with good average time If your application happens to always use the “bad” case, you are in big trouble! Randomized with good expected time Once in a while you will have an expensive operation, but no inputs can make this happen all the time Kind of like an insurance policy for your algorithm!

Treap Dictionary Data Structure heap in yellow; search tree in blue Treaps have the binary search tree binary tree property search tree property Treaps also have the heap-order property! randomly assigned priorities 2 9 6 7 4 18 Now, let’s look at a really funky tree. It combines a tree and a heap and gets good expected runtime. 7 8 9 15 10 30 Legend: priority key 15 12

Treap Insert Choose a random priority Insert as in normal BST Rotate up until heap order is restored (maintaining BST property while rotating) insert(15) 2 9 2 9 2 9 This is pretty simple as long as rotate is already written. 6 7 14 12 6 7 14 12 6 7 9 15 7 8 7 8 9 15 7 8 14 12

Tree + Heap… Why Bother? Insert data in sorted order into a treap; what shape tree comes out? 6 7 insert(7) 6 7 insert(8) 8 6 7 insert(9) 8 2 9 6 7 insert(12) 8 2 9 15 12 Notice that it doesn’t matter what order the input comes in. The shape of the tree is fully specified by what the keys are and what their random priorities are! So, there’s no bad inputs, only bad random numbers! That’s the difference between average time and expected time. Which one is better? Legend: priority key

Treap Delete Find the key Increase its value to  Rotate it to the fringe Snip it off delete(9) 2 9 6 7 rotate left rotate left 6 7 9 15  9 7 8 15 12 7 8 9 15 6 7 rotate right 15 12 7 8 Sorry about how packed this slide is. Basically, rotate the node down to the fringe and then cut it off. This is pretty simple as long as you have rotate, as well. However, you do need to find the smaller child as you go!  9 9 15 15 12

Treap Delete, cont. 6 7 6 7 6 7 rotate right rotate right 7 8 7 8 7 8 9 15 9 15  9 9 15  9 15 12 15 12 15 12  9 snip!

Treap Summary Implements Dictionary ADT Memory use insert in expected O(log n) time delete in expected O(log n) time find in expected O(log n) time but worst case O(n) Memory use O(1) per node about the cost of AVL trees Very simple to implement, little overhead – less than AVL trees The big advantage of this is that it’s simple compared to AVL or even splay. There’s no zig-zig vs. zig-zag vs. zig. Unfortunately, it doesn’t give worst case or even amortized O(log n) performance. It gives expected O(log n) performance.

Other Randomized Data Structures & Algorithms Randomized skip list cross between a linked list and a binary search tree O(log n) expected time for finds, and then can simply follow links to do range queries Randomized QuickSort just choose pivot position randomly expected O(n log n) time for any input

Randomized Primality Testing No known polynomial time algorithm for primality testing but does not appear to be NP-complete either – in between? Best known algorithm: Guess a random number 0 < A < N If (AN-1 % N)  1, then N is not prime Otherwise, 75% chance N is prime or is a “Carmichael number” – a slightly more complex test rules out this case Repeat to increase confidence in the answer

Randomized Search Algorithms Finding a goal node in very, very large graphs using DFS, BFS, and even A* (using known heuristic functions) is often too slow Alternative: random walk through the graph

N-Queens Problem Place N queens on an N by N chessboard so that no two queens can attack each other Graph search formulation: Each way of placing from 0 to N queens on the chessboard is a vertex Edge between vertices that differ by adding or removing one queen Start vertex: empty board Goal vertex: any one with N non-attacking queens (there are many such goals)

DFS (over vertices where no queens attack each other) Demo: N-Queens DFS (over vertices where no queens attack each other) versus Random walk (biased to prefer moving to vertices with fewer attacks between queens)

Random Walk – Complexity? Random walk – also known as an “absorbing Markov chain”, “simulated annealing”, the “Metropolis algorithm” (Metropolis 1958) Can often prove that if you run long enough will reach a goal state – but may take exponential time In some cases can prove that with high probability a goal is reached in polynomial time e.g., 2-SAT, Papadimitriou 1997 Widely used for real-world problems where actual complexity is unknown – scheduling, optimization N-Queens – probably polynomial, but no one has tried to prove formal bound

Traveling Salesman Recall the Traveling Salesperson (TSP) Problem: Given a fully connected, weighted graph G = (V,E), is there a cycle that visits all vertices exactly once and has total cost  K? NP-complete: reduction from Hamiltonian circuit Occurs in many real-world transportation and design problems Randomized simulated annealing algorithm demo

Final Review (“We’ve covered way too much in this course… What do I really need to know?”)

Be Sure to Bring 1 page of notes A hand calculator Several #2 pencils

Final Review: What you need to know Basic Math Logs, exponents, summation of series Proof by induction Asymptotic Analysis Big-oh, Theta and Omega Know the definitions and how to show f(N) is big-O/Theta/Omega of (g(N)) How to estimate Running Time of code fragments E.g. nested “for” loops Recurrence Relations Deriving recurrence relation for run time of a recursive function Solving recurrence relations by expansion to get run time

Final Review: What you need to know Lists, Stacks, Queues Brush up on ADT operations – Insert/Delete, Push/Pop etc. Array versus pointer implementations of each data structure Amortized complexity of stretchy arrays Trees Definitions/Terminology: root, parent, child, height, depth etc. Relationship between depth and size of tree Depth can be between O(log N) and O(N) for N nodes

Final Review: What you need to know Binary Search Trees How to do Find, Insert, Delete Bad worst case performance – could take up to O(N) time AVL trees Balance factor is +1, 0, -1 Know single and double rotations to keep tree balanced All operations are O(log N) worst case time Splay trees – good amortized performance A single operation may take O(N) time but in a sequence of operations, average time per operation is O(log N) Every Find, Insert, Delete causes accessed node to be moved to the root Know how to zig-zig, zig-zag, etc. to “bubble” node to top B-trees: Know basic idea behind Insert/Delete

Final Review: What you need to know Priority Queues Binary Heaps: Insert/DeleteMin, Percolate up/down Array implementation BuildHeap takes only O(N) time (used in heapsort) Binomial Queues: Forest of binomial trees with heap order Merge is fast – O(log N) time Insert and DeleteMin based on Merge Hashing Hash functions based on the mod function Collision resolution strategies Chaining, Linear and Quadratic probing, Double Hashing Load factor of a hash table

Final Review: What you need to know Sorting Algorithms: Know run times and how they work Elementary sorting algorithms and their run time Selection sort Heapsort – based on binary heaps (max-heaps) BuildHeap and repeated DeleteMax’s Mergesort – recursive divide-and-conquer, uses extra array Quicksort – recursive divide-and-conquer, Partition in-place fastest in practice, but O(N2) worst case time Pivot selection – median-of-three works best Know which of these are stable and in-place Lower bound on sorting, bucket sort, and radix sort

Final Review: What you need to know Disjoint Sets and Union-Find Up-trees and their array-based implementation Know how Union-by-size and Path compression work No need to know run time analysis – just know the result: Sequence of M operations with Union-by-size and P.C. is (M (M,N)) – just a little more than (1) amortized time per op Graph Algorithms Adjacency matrix versus adjacency list representation of graphs Know how to Topological sort in O(|V| + |E|) time using a queue Breadth First Search (BFS) for unweighted shortest path

Final Review: What you need to know Graph Algorithms (cont.) Dijkstra’s shortest path algorithm Depth First Search (DFS) and Iterated DFS Use of memory compared to BFS A* - relation of g(n) and h(n) Minimum Spanning trees – Kruskal’s algorithm Connected components using DFS or union/find NP-completeness Euler versus Hamiltonian circuits Definition of P, NP, NP-complete How one problem can be “reduced” to another (e.g. input to HC can be transformed into input for TSP)

Final Review: What you need to know Multidimensional Search Trees k-d Trees – find and range queries Depth logarithmic in number of nodes Quad trees – find and range queries Depth logarithmic in inverse of minimal distance between nodes But higher branching fractor means shorter depth if points are well spread out (log base 4 instead of log base 2) Randomized Algorithms expected time vs. average time vs. amortized time Treaps, randomized Quicksort, primality testing