MS 101: Algorithms Instructor Neelima Gupta
Table of Contents Review of – Sorting – Searching – Growth functions – Recurrence Relations
Iterative Techniques Insertion Sort Selection Sort
Divide and Conquer Merge Sort Quick Sort – Why quick sort performs well in practice. – We’ll study randomized quick sort later that sorts in O(n log n) time on an average.
Searching Sequential Search What if the application requires lot of searches? Suppose we have a university with 10,000 student records stored only once. i.e. no insertions and deletions are performed once the data is stored…..can we do better than sequential search? What if we need to perform frequent insertions and deletions?
Solution Keep the data in a dynamic data structure like a binary search tree : – How much time to insert, delete and search in the worst case? – Can we do better?
Balanced binary search tree Red Black Trees AVL Trees Advantage: Fast operations Negative: Complicated rotations. We’ll study about a randomized data structure called skip list later which does insertion, deletion and search in O(log n) time on an average.
Review: Growth Functions Big O Notation In general a function – f(n) is O(g(n)) if there exist positive constants c and n 0 such that f(n) c g(n) for all n n 0 Formally – O(g(n)) = { f(n): positive constants c and n 0 such that f(n) c g(n) n n 0 Intuitively, it means f(n) grows no faster than g(n). Examples: – n^2, n^2 – n – n^3, n^3 – n^2 – n
Omega Notation In general a function – f(n) is (g(n)) if positive constants c and n 0 such that 0 c g(n) f(n) n n 0 Intuitively, it means f(n) grows at least as fast as g(n). Examples: – n^2, n^2 + n – n^3, n^3 + n^2 – n
Theta Notation A function f(n) is (g(n)) if positive constants c 1, c 2, and n 0 such that c 1 g(n) f(n) c 2 g(n) n n 0
Other Asymptotic Notations A function f(n) is o(g(n)) if for every positive constant c, there exists a constant n 0 > 0 such that f(n) < c g(n) n n 0 A function f(n) is (g(n)) if for every positive constant c, there exists a constant n 0 > 0 such that c g(n) < f(n) n n 0 Intuitively, –o() is like < –O() is like – () is like > – () is like – () is like =
Why the constants ‘c’ and ‘m’? Suppose we have two algorithms to solve the problem say sorting: Insertion Sort and Merge sort for eg. Why should we have more than one algorithm to solve the same problem? Ans: efficiency. What’s the measure of efficiency? Ans: System resources for example ‘time’. How do we measure time?
Contd.. IS(n) = O(n^2) MS(n) = O(nlog n) MS(n) is faster than IS(n). Suppose we run IS on a fast machine and MS on a slow machine and measure the time (since they were developed by two different people living in different part of the globe), we may get less time for IS and more for MS…wrong analysis Solution: count the number of steps on a generic computational model
Computational Model: Analysis of Algorithms Analysis is performed with respect to a computational model We will usually use a generic uniprocessor random- access machine (RAM) – All memory equally expensive to access – No concurrent operations – All reasonable instructions take unit time Except, of course, function calls – Constant word size Unless we are explicitly manipulating bits
Running Time Number of primitive steps that are executed – Except for time of executing a function call, in this model most statements roughly require the same (within constant factor) amount of time y = m * x + b c = 5 / 9 * (t - 32 ) z = f(x) + g(y) We can be more exact if need be
But why ‘c’ and ‘m’? Because – We compare two algorithms on the basis of their number of steps and – the actual time taken by an algorithm is (no more than in case of ‘O’ or no less than in case of ‘Ω’) ‘c’ times the number of steps.
Why ‘m’? We need efficient algorithms and computational tools to solve problems on big data. For example, it is not very difficult to sort a pack of 52 cards manually. However, to sort all the books in a library on their accession number might be tedious if done manually. So we want to compare algorithms for large input.
Arrange some functions Let us arrange the following functions in ascending order (assume log n = o(n) is known) – n, n^2, n^3, sqrt(n), n^epsilon, log n, log^2 n, n log n, n/log n, 2^n, 3^n
What about m? Why do we say n>= m? Why not n <= m? Answer: for small m we can perhaps solve it manually? For example Insertion Sort with a pack of cards (n= 52)
Assignment No 1 Show that log^M n = o(n^epsilon) for all constants M>0 and epsilon > 0. Assume that log n = o(n). Also prove the following Corollary: log n = o(n/log n) Show that n/logn = o(n^epsilon) for every epsilon > 0.
Assignment No 2 Show that – lim f(n)/g(n) = 0 => f(n) = o(g(n)). n → ∞ – lim f(n)/g(n) = c => f(n) = θ(g(n)). n → ∞, where c is a positive constant. Show that log n = o(n). Show that n^k = o(2^n) for every positive constant k.
Review Recurrence Relations Why study Recurrence Relations in this course? – Merge Sort – Qsort
Solve the following recurrence: T(n) = T(αn) + T(βn) + n, where 0 < α ≤ β < 1 Assume suitable initial conditions. Assignment 3
The Master Theorem if T(n) = aT(n/b) + f(n) then
To follow: Selection Lower Bounding Techniques Correctness