Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta

Slides:



Advertisements
Similar presentations
David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Order Statistics Sorted
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Introduction to Algorithms Jiafen Liu Sept
Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
Analysis of Algorithms
© 2004 Goodrich, Tamassia Quick-Sort     29  9.
© 2004 Goodrich, Tamassia QuickSort1 Quick-Sort     29  9.
Quick-Sort     29  9.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
© 2004 Goodrich, Tamassia Skip Lists1  S0S0 S1S1 S2S2 S3S3    2315.
Comp 122, Spring 2004 Keys into Buckets: Lower bounds, Linear-time sort, & Hashing.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
© 2004 Goodrich, Tamassia Selection1. © 2004 Goodrich, Tamassia Selection2 The Selection Problem Given an integer k and n elements x 1, x 2, …, x n, taken.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Advanced Algorithms for Massive Datasets Basics of Hashing.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Study Group Randomized Algorithms Jun 7, 2003 Jun 14, 2003.
Selection1. 2 The Selection Problem Given an integer k and n elements x 1, x 2, …, x n, taken from a total order, find the k-th smallest element in this.
Skip Lists1 Skip Lists William Pugh: ” Skip Lists: A Probabilistic Alternative to Balanced Trees ”, 1990  S0S0 S1S1 S2S2 S3S3 
Introduction To Algorithms CS 445 Discussion Session 2 Instructor: Dr Alon Efrat TA : Pooja Vaswani 02/14/2005.
Spring 2015 Lecture 6: Hash Tables
Instructor Neelima Gupta Introduction to some tools to designing algorithms through Sorting Iterative Divide and Conquer.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
MS 101: Algorithms Instructor Neelima Gupta
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Randomized Algorithms CSc 4520/6520 Design & Analysis of Algorithms Fall 2013 Slides adopted from Dmitri Kaznachey, George Mason University and Maciej.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Skip Lists 二○一七年四月二十五日
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
2/19/2016 3:18 PMSkip Lists1  S0S0 S1S1 S2S2 S3S3    2315.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
David Luebke 1 3/19/2016 CS 332: Algorithms Augmenting Data Structures.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Quick-Sort 2/18/2018 3:56 AM Selection Selection.
Skip Lists S3   S2   S1   S0  
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
CSCI 210 Data Structures and Algorithms
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Quick-Sort 9/12/2018 3:26 PM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Selection Selection 1 Quick-Sort Quick-Sort 10/30/16 13:52
Skip Lists S3 + - S2 + - S1 + - S0 + -
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Quick-Sort 2/25/2019 2:22 AM Quick-Sort     2
Quick-Sort 4/8/ :20 AM Quick-Sort     2 9  9
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
CS210- Lecture 17 July 12, 2005 Agenda Collision Handling
Quick-Sort 5/7/2019 6:43 PM Selection Selection.
CS 3343: Analysis of Algorithms
Quick-Sort 5/25/2019 6:16 PM Selection Selection.
Presentation transcript:

Instructor Neelima Gupta

Expected Running Times and Randomized Algorithms Instructor Neelima Gupta

Expected Running Time of Insertion Sort x 1,x 2, , x i-1,x i, …,x n For I = 2 to n Insert the ith element x i in the partially sorted list x 1,x 2, , x i-1. (at r th position)

Expected Running Time of Insertion Sort Let X i be the random variable which represents the number of comparisons required to insert i th element of the input array in the sorted sub array of first i-1 elements. X i : can take values 1…i-1 (denoted by x i1,x i2, …,x ii) E(X i ) = Σ j x ij p(x ij ) where E(X i ) is the expected value X i And, p(x ij ) is the probability of inserting x i in the j th position 1≤j≤i

Expected Running Time of Insertion Sort x 1,x 2, , x i-1,x i, …,x n How many comparisons it makes to insert i th element in j th position? (at j th position)

Position# of Comparisions i1 i-12 i i-1 1i-1 Note: Here, both position 2 and 1 have # of Comparisions equal to i-1. Why? Because to insert element at position 2 we have to compare with previously first element. and after that comparison we know which of them come first and which at second.

Thus, E(X i ) = (1/i) { i-1 Σ k=1 k + (i-1) } where 1/i is the probability to insert at j th position in the i possible positions. For n elements, E(X 1 + X X n ) = n Σ i=2 E(X i ) = n Σ i=2 (1/i) { i-1 Σ k=1 k + (i-1) } = (n-1)(n-4) / 4 Therefore average case of insertion sort takes Θ(n 2 )

For n number of elements, expected time taken is, T = n Σ i=2 (1/i) { i-1 Σ k=1 k + (i-1) } where 1/i is the probability to insert at r th position in the i possible positions. E(X 1 + X X n ) = n Σ i=1 E(X i ) Where,Xi is expected value of inserting X i element. T = (n-1)(n-4) / 4 Therefore average case of insertion sort takes Θ(n 2 )

Quick-Sort Pick the first item from the array--call it the pivot Partition the items in the array around the pivot so all elements to the left are  to the pivot and all elements to the right are greater than the pivot Use recursion to sort the two partitions pivot partition: items > pivot partition 1: items  pivot

Quicksort: Expected number of comparisons Partition may generate splits (0:n-1, 1:n-2, 2:n-3, …, n-2:1, n-1:0) each with probability 1/n If T(n) is the expected running time,

Randomized Quick-Sort Pick an element from the array--call it the pivot Partition the items in the array around the pivot so all elements to the left are  to the pivot and all elements to the right are greater than the pivot Use recursion to sort the two partitions pivot partition: items > pivot partition 1: items  pivot

Remarks Not much different from the Q-sort except that earlier, the algorithm was deterministic and the bounds were probabilistic. Here the algorithm is also randomized. We pick an element to be a pivot randomly. Notice that there isn’t any difference as to how does the algorithm behave there onwards? In the earlier case, we can identify the worst case input. Here no input is worst case.

Randomized Select

Randomized Algorithms A randomized algorithm performs coin tosses (i.e., uses random bits) to control its execution i ← random() if i = 0 do A … else { i.e. i = 1} do B … Its running time depends on the outcomes of the coin tosses

Assumptions coins are unbiased, and coin tosses are independent The worst-case running time of a randomized algorithm may be large but occurs with very low probability (e.g., it occurs when all the coin tosses give “heads”)

Monte Carlo Algorithms Running times are guaranteed but the output may not be completely correct. Probability of error is low.

Las Vegas Algorithms Output is guaranteed to be correct. Bounds on running times hold with high probability. What type of algorithm is Randomized Qsort?

Why expected running times? Markov’s inequality P( X > k E(X)) < 1/k i.e. the probability that the algorithm will take more than O(2 E(X)) time is less than 1/2. Or the probability that the algorithm will take more than O(10 E(X)) time is less than 1/10. This is the reason why Qsort does well in practice.

Markov’s Bound P(X<kM)< 1/k,where k is a constant. Chernouff’s Bound P(X>2μ)< ½ A More Stronger Result P(X>k μ )< 1/n k,where k is a constant.

Binary Search Tree What is a binary search tree? A BST is a possibly empty rooted tree with a key value, a possible empty left subtree and a possible empty right subtree. Each of the left subtree and the right subtree is a BST.

Binary Search Tree Pick the first item from the array--call it the pivot…it becomes the root of the BST. Partition the items in the array around the pivot so that all elements to the left are  the pivot and all elements to the right are greater than the pivot Recursively Build a BST on each partition. They become the left and the right sub-tree of the root.

Binary Search Tree Consider the following input: 1,2,3 …………………10,000. What is the time for construction? Search Time?

Randomly Built Binary Search Tree Pick an item from the array randomly --call it the pivot…it becomes the root of the BST. Partition the items in the array around the pivot so that all elements to the left are  the pivot and all elements to the right are greater than the pivot Recursively Build a BST on each partition. They become the left and the right sub-tree of the root.

Example Consider the input 10, 20, 30, 40, 50, 60, 70, 80, 90, 100.

WLOG, assume that the keys are distinct. (What if they are not?) Rank(x) = number of elements < x Let X i : height of the tree rooted at a node with rank=i. Let Y i : exponential height of the tree=2^X i Let H : height of the entire BST, then H=max{H1,H2} + 1 where H1 : ht. of left subtree H2 : ht.of right subtree Height of the RBST

Y=2^H =2.max{2^H1,2^H2} E(EH(T(n))): Expected value of exponential ht. of the tree with ‘n’ nodes. E(EH(T(n))) =2/n ∑ max{EH(T(k)),EH(T(n-1-k))} =O(n^3) E(H(T(n))) =E(log (EH(T(n)))) = O(log n)

Construction Time? Search Time? What is the worst case input?

Acknowledgements Kunal Verma Nidhi Aggarwal And other students of MSc(CS) batch 2009.

Hashing Motivation: symbol tables A compiler uses a symbol table to relate symbols to associated data Symbols: variable names, procedure names, etc. Associated data: memory location, call graph, etc. For a symbol table (also called a dictionary), we care about search, insertion, and deletion We typically don’t care about sorted order

Hash Tables More formally: Given a table T and a record x, with key (= symbol) and satellite data, we need to support: Insert (T, x) Delete (T, x) Search(T, x) We want these to be fast, but don’t care about sorting the records The structure we will use is a hash table Supports all the above in O(1) expected time!

Hash Functions Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

Resolving Collisions How can we solve the problem of collisions? One of the solution is : chaining Other solutions: open addressing

Chaining Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Chaining How do we insert an element? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Chaining How do we delete an element? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Chaining How do we search for a element with a given key? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table: the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key?

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  )

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  ) What will be the average cost of a successful search?

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  ) What will be the average cost of a successful search? A: O((1 +  )/2) = O(1 +  )

Analysis of Chaining Continued So the cost of searching = O(1 +  ) If the number of keys n is proportional to the number of slots in the table, what is  ? A:  = O(1) In other words, we can make the expected cost of searching constant if we make  constant

If we could prove this, P(failure)<1/k (we are sort of happy) P(failure)<1/n k (most of times this is true and we’re happy ) P(failure)<1/2 n (this is difficult but still we want this) A Final Word About Randomized Algorithms

Acknowledgements Kunal Verma Nidhi Aggarwal And other students of MSc(CS) batch 2009.

END