Download presentation
Presentation is loading. Please wait.
Published byNora Lee Modified over 9 years ago
1
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005
2
What is algorithm Well-defined computational procedure that takes some values as input and produces some value as output. We are interested in the correctness and efficiency of computer algorithms We seek to extract clean, well-defined problems from the typically messy “real” problem to gain insight into it.
3
Example of an algorithm Input: A sequence of n numbers (a 1, a 2, …a n ). Output: A permutation (a’ 1, a’ 2, …a’ n ) of the input sequence such that a’ 1 ≤ a’ 2 ≤ …a’ n.
4
Exact String Matching Input: A text string T, where |T| = n, and a pattern string P, where |P| = m. Output: An index i such that T i+k-1 = P k for all 1 ≤ k ≤ m, i.e. showing that P is a substring of T. abcabaabcabac abaa Text T: Pattern P:
5
Exact String Matching Brute force search algorithm for i =1 to n-m+1 do j=1; while ( T[i+j-1] == P[j] ) and (j <= m) j=j+1; if (j > m) then print “pattern at position ”, i;
6
Algorithm Efficiency Time efficiency of algorithms Space efficiency of algorithms
7
Machine Independent Analysis We assume that every basic operation takes constant time: Example Basic Operations: Addition, Subtraction, Multiplication, Memory Access Time efficiency of an algorithm is the number of basic operations it performs We do not distinguish between the basic operations.
8
Time efficiency In fact, we will not worry about the exact values, but will look at ``broad classes’ of values. Let there be n inputs. If an algorithm needs n basic operations and another needs 2n basic operations, we will consider them to be in the same efficiency category. However, we distinguish between exp(n), n, log(n)
9
Example: Time Complexity This algorithm might use only n steps if we are lucky. We might need about n*m steps if we are unlucky
10
Order of Increase We worry about the increase speed of our algorithms with increased input sizes. n log n exp (n)
11
Function Orders A function f(n) is O(g(n)) if ``increase’’ of f(n) is not faster than that of g(n). A function f(n) is O(g(n)) if there exists a number n0 and a nonnegative c such that for all n n0, 0 f(n) cg(n). If limn f(n)/g(n) exists and is finite, then f(n) is O(g(n))
12
Implication of Big oh notation Big oh notation ― an upper bound on the number of steps that an algorithm takes in the worst case. Suppose we know that our algorithm uses at most O(f(n)) basic steps for any n inputs, and n is sufficiently large, then we know that our algorithm will terminate after executing at most constant times f(n) basic steps. We know that a basic step takes a constant time in a machine. Hence, our algorithm will terminate in a constant times f(n) units of time, for all large n.
13
Algorithm Complexity Thus the brute force string matching algorithm is O(mn), or takes quadratic time An quadratic time algorithm is usually fast enough for small problems, but not big ones. An exponential-time algorithm can only be fast enough for tiny problems
14
Any improvement based on brute force search? Some of these comparisons are wasted work! By being more clever, we can reduce the worst case running time to O(n+m) Knuth-Morris-Pratt string matching
15
NP, NP hard, NP complete Problems A problem is assigned to the NP class if it can be verified in polynomial time. A problem is NP-hard if an algorithm for solving it can be translated into one for solving any other NP-problemalgorithmNP-problem NP-hard therefore means "at least as hard as any NP-problem,“NP-problem NP-complete: it is both NP problem and NP- hard problem
16
NP-Completeness Unfortunately, for many problems, there is no known polynomial algorithm Even worse, most of these problems can be proven NP-complete, meaning that no such algorithm can exist! Heuristics, approximate
17
Shortest Common Superstring Input: A set S = {s 1, s 2, … s m } of text strings on some alphabet £. Output: the shortest possible string T such that each s i is a substring of T. This application arises in DNA sequencing
18
Shortest common superstring
19
NP-complete problems. Can you suggest an algorithm to find the shortest common superstring? Greedy heuristic ― approximate optimal solution
20
Greedy Heuristic We always merge the two strings with the longest overlap Put the combined string back Repeat until only one string remains GREEDY finds a superstring of length at most twice optimal
21
Time complexity of the greedy heuristic We assume n strings, each string has a length of k. N rounds O(N 2 ) strings comparisons Each string comparison takes k 2 steps.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.