CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Data Structures Data Structures Topic #13. Today’s Agenda Sorting Algorithms: Recursive –mergesort –quicksort As we learn about each sorting algorithm,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples.
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Spring 2015 Lecture 6: Hash Tables
Sorting with Heaps Observation: Removal of the largest item from a heap can be performed in O(log n) time Another observation: Nodes are removed in order.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Advanced Data Structure Hackson Leung
Data Structures Week 8 Further Data Structures The story so far  Saw some fundamental operations as well as advanced operations on arrays, stacks, and.
Search  We’ve got all the students here at this university and we want to find information about one of the students.  How do we do it?  Linked List?
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Week 13 - Friday.  What did we talk about last time?  Sorting  Insertion sort  Merge sort  Started quicksort.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Heaps and basic data structures David Kauchak cs161 Summer 2009.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Week 13 - Wednesday.  What did we talk about last time?  NP-completeness.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
CS6045: Advanced Algorithms Data Structures. Dynamic Sets Next few lectures will focus on data structures rather than straight algorithms In particular,
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
Priority Queues and Heaps Tom Przybylinski. Maps ● We have (key,value) pairs, called entries ● We want to store and find/remove arbitrary entries (random.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
CSE373: Data Structures & Algorithms Priority Queues
COMP 53 – Week Eleven Hashtables.
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Priority Queues and Heaps
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Randomized Algorithms
October 30th – Priority QUeues
Programming Abstractions
Summary of General Binary search tree
Hashing Exercises.
Cse 373 April 26th – Exam Review.
Hash functions Open addressing
Hash Tables Part II: Using Buckets
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Hash Tables.
Ch 7: Quicksort Ming-Te Chi
Randomized Algorithms
ITEC 2620M Introduction to Data Structures
Lecture 3 / 4 Algorithm Analysis
Lesson 6. Types Equality and Identity. Collections.
8/04/2009 Many thanks to David Sun for some of the included slides!
Hashing Alexandra Stefan.
CS6045: Advanced Algorithms
Sub-Quadratic Sorting Algorithms
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
DATA STRUCTURE.
Amortized Analysis and Heaps Intro
Hashing.
The Selection Problem.
Quicksort and Randomized Algs
Data Structures for Shaping and Scheduling
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Lecture-Hashing.
Presentation transcript:

CSC317 Selection problem q p r Randomized‐Select(A,p,r,i) < pivot q > pivot p r Randomized‐Select(A,p,r,i) 1 if p==r //base case return A[p] 3 q = Randomized-partition(A, p, r) 4 k = q-p+1 //number elements from left up to pivot 5 if i==k // pivot is the ith smallest return A[q] 7 elseif i<k return Randomized-Select(A,p,q-1, i) //ith smallest left 9 else return Randomized-Select(A,q+1,r,i-k) //on right Example: A = [4 1 6 5 3], find the 3rd smallest value CSC317

CSC317 Selection problem q p r < pivot q > pivot p r How is this different from randomized version of Quicksort? Answer: Only one recursion (left or right); not two CSC317

CSC317 Selection problem q p r Analysis: Worst case: < pivot q > pivot p r Analysis: Worst case: When we always partition around largest remaining Element, and recursion on array size n‐1 (which takes θ(n)) Worse than a good sorting scheme. CSC317

CSC317 Selection problem q p r < pivot q > pivot p r Analysis: However a 1/10 and 9/10 scheme isn’t bad … Remember master theorem: Therefore: Average solution isn’t bad either (θ(n), Proof similar to quicksort). CSC317

We’ve been talking a lot about efficiency in computing and run time …. But haven’t said anything about data structures CSC317

Dynamic sets … CSC317 Set size changes over time Elements could have identifying keys, and could also have satellite data example: key corresponding to friend name, with satellite data corresponding to email, phone, favorite hobbies, etc. CSC317

Dynamic sets CSC317 What about operations on dynamic sets …? Search Insert Delete Min/Max Successor/Predecessor Which data structure? Depends on what you want to do … (hash table, stack, queue, linked list, trees, ...) CSC317

Data structures CSC317 Hash table Insert, Delete, Search/lookup We don’t maintain order information Applications (Later …)? We’ll see that operations are on average O(1) CSC317

Data structures CSC317 Stack Last-in-first-out Insert = push Delete = pop Applications? Run time of push and pop? O(1) But limited operations… (eg, if you want to search it’s not efficient) push pop CSC317

CSC317 Data structures Queue first-in-first-out Insert = enqueue Delete= dequeue Run time for enqueue/dequeue: O(1) Fast, but limited operations CSC317

CSC317 Data structures Linked List Search Insert Delete Linked lists (example of double linked) Run time? • Search O(n) [limitation if lots of searches] • Insert O(1) • Delete O(1) [unless first searching for key] CSC317

CSC317 Data structures Binary tree Search Min/Max Predecessor/Successor Insert/Delete Later;; basic operations take height of tree, complete binary tree Θ(logn) CSC317

Data structures Heap: main operations: (discussed in sorting chapter) a heap is a specialized tree structure that satisfies the heap property: in a min heap the parent is always smaller than the child node. In a max heap the parent is always bigger than the child node. Insert Θ(log n) Remove object that is min (or max, but not both) Θ(log n) Technically, can be implemented via a complete binary tree Application: Heap sort (we’ll discuss finding median dynamically) CSC317

Finding median dynamically Input: numbers presented one by one: x1, x2, x3, …, xn Output: At each time step, the median Run time? We know we can do O(n) but dynamically each time we add a number, would like to do better and not have to recompute with O(n) Using two heaps: one for max and one for min O(log k) in each step CSC317

Finding median dynamically Low (Max) Heap holding smaller numbers: performs max operation in O(log k) time High (Min) Heap holding larger numbers: performs min Invariant: half smallest number of elements so far in low heap; half highest in high heap In this arrangement we can find out in O(1) time whether a new number would go to the upper half or lower half. All we need to do is to compare the new number with the head of two heaps and insert it in O(log n) time. What about if heaps are unbalanced? If Low (Max heap) has 6 elements and High (Min heap) has 5 elements, and next element is less than max of Low, insert in low and move min of High to Low… CSC317

Finding median dynamically What about if heaps are unbalanced? If Low (Max heap) has 6 elements and High (Min heap) has 5 elements, and next element is less than max of Low, insert in low and move min of High to Low… CSC317

Hash table CSC317 We have elements with key and satellite data Operations performed: Insert, Delete, Search/lookup We don’t maintain order information We’ll see that all operations on average O(1) But worse case can be O(n) CSC317

Hash table Simple implementation: If universe of keys comes from a small set of integers [0..9], we can store directly in array using the keys as indices into the slots This is also called a direct-address table Search time just like in array – O(1)! CSC317

CSC317 Example: Array versus Hash table Imagine we have keys corresponding to friends that we want to store Could use huge array, with each friend’s name mapped to some slot in array (eg, one slot in array for every possible name; each letter one of 26 characters, n letters in each name…) We could insert, find key, and delete element in O(1) time – very fast! But huge waste of memory, with many slots empty in many applications! John = A[34] Jane = A[33334] CSC317

CSC317 Example: versus Linked List An alternative might be to use a linked list with all the friend names linked (e.g. John -> Jane -> Bill) Pro: This is not wasteful because we only store the names that we want Con: Search goes with O(n) Can we have an approach that is fast and not wasteful (like best of both worlds)? CSC317

Hash table Extremely useful alternative to static array for insert, search, and delete in O(1) time (on average) – VERY FAST Useful when the universe is large, but at any given time number of keys stored is small relative to total number of possible keys (not wasteful like a huge static array) We usually don’t store key directly as index into array, but rather compute a hash function of the key k, h(k), as index Problem: Collision (i.e. two keys map to the same slot) CSC317

CSC317 Collisions Guaranteed to happen, when more keys than slots Or if “bad” hashing function – all keys were hashed to just one slot of hash table (more later …) Even by chance, collisions are likely to happen. Consider keys that are birthdays. Recall the birthday paradox – room of 28 people, then 2 people have a 50 percent chance to have same birthday Resolution? Chaining: CSC317

Analysis Worst case: all n elements map to one slot (one big linked list…). O(n) Average case: Need to define: m number of slots n number of keys in hashtable alpha = load factor Intuitively, alpha is average number elements per linked list. CSC317

CSC317 Hash table analyses Example: let’s take n = m; alpha = 1 Good hash function: each element of hash table has one linked list Bad hash function: hash function always maps to first slot of hash table, one linked list size n, and all other slots are empty. Define: Unsuccessful search: new key searched for doesn’t exist in hash table (we are searching for a new friend, Sarah, who is not yet in hash table) Successful search: key we are searching for already exists in hash table (we are searching for Tom, who we have already stored in hash table) CSC317

CSC317 Hash table analyses Theorem: Assuming uniform hashing, unsuccessful search takes on average O(α + 1). Here the actual search time is O(α) and the added 1 is the constant time to compute a hash function on the key that is being searched. Interpretation: n = m, θ(1+1) = θ(1) n = 2m, θ(2+1) = θ(1) n = m3, θ(m2+1) ≠ θ(1) we say constant time on average when n and m similar order, but not generally guaranteed CSC317

CSC317 Hash table analyses Theorem: Intuitively: Search for key k, hash function will map onto slot h(k). We need to search through linked list in the slot mapped to, up until the end of the list (because key is not found = unsuccessful search). For n=2m, on average linked list is length 2. More generally, on average length is α, our load factor. CSC317