CS5263 Bioinformatics Lecture 9: Motif finding Biological & Statistical background
Roadmap Review of last lecture Intro to probability and statistics Intro to motif finding problems –Biological background
Multiple Sequence Alignment
Scoring functions Ideally: –Maximizes probability that sequences evolved from common ancestor In practice: –Sum of Pairs x y z w v ? x:AC-GCGG-C y:AC-GC-GAG z:GCCGC-GAG x: ACGCGG-C x: AC-GCGG-C; y: AC-GCGAG y: ACGC-GAC z: GCCGC-GAG; z: GCCGCGAG
Algorithms MDP Progressive alignment Iterative refinement Restricted DP
MDP Similar to pair-wise alignment –O(2 N L N ) running time –O(L N ) memory F(i-1,j-1,k-1) + S(x i, x j, x k ), F(i-1,j-1,k ) + S(x i, x j, -), F(i-1,j,k-1) + S(x i, -, x k ), F(i,j,k) = max F(i,j-1,k-1) + S(-, x j, x k ), F(i-1,j,k ) + S(x i, -, -), F(i,j-1,k ) + S(-, x j, -), F(i,j,k-1) + S(-, -, x k ) (i,j,k) (i,j,k-1) (i-1,j,k-1) (i-1,j-1,k-1) (i-1,j-1,k) (i,j-1,k) (i-1,j,k) (i,j-1,k-1)
Progressive alignment Most popular multiple alignment algorithm –CLUSTALW Main idea: –Construct a guide tree based on pair-wise alignment scores –Align the most similar sequences first –Progressively add other sequences Pros: fast (O(NL 2 ) Cons: initial bad alignment is frozen
Iterative Refinement Basic idea: –Do progressive alignment first –Iteratively: Remove a sequence, and realign it back while keeping the rest fixed A note of its convergence guarantee –Every time we realign a sequence, we improve its score –Therefore, the algorithm must converge to either a global or local maximum
Restricted MDP Similar to bounded DP in pair-wise alignment 1.Construct progressive multiple alignment m 2.Run MDP, restricted to radius R from m Running Time: O(2 N R N-1 L) x y z
Today Probability and statistics Biology background for motif finding
Probability Basics Definition (informal) –Probabilities are numbers assigned to events that indicate “how likely” it is that the event will occur when a random experiment is performed –A probability law for a random experiment is a rule that assigns probabilities to the events in the experiment –The sample space S of a random experiment is the set of all possible outcomes
Example 0 P(A i ) 1 P(S) = 1
Random variable A random variable is a function from a sample to the space of possible values of the variable –When we toss a coin, the number of times that we see heads is a random variable –Can be discrete or continuous The resulting number after rolling a die The weight of an individual
Cumulative distribution function (cdf) The cumulative distribution function F X (x) of a random variable X is defined as the probability of the event {X≤x} F (x) = P(X ≤ x) for −∞ < x < +∞
Probability density function (pdf) The probability density function of a continuous random variable X, if it exists, is defined as the derivative of F X (x) For discrete random variables, the equivalent to the pdf is the probability mass function (pmf):
Probability density function vs probability What is the probability for somebody weighting 200lb? The figure shows about 0.62 –What is the probability of lb? The right question would be: –What’s the probability for somebody weighting lb. The probability mass function is true probability –The chance to get any face is 1/6
Some common distributions Discrete: –Binomial –Multinomial –Geometric –Hypergeometric –Possion Continuous –Normal (Gaussian) –Uniform –EVD –Gamma –Beta –…
Probabilistic Calculus If A, B are mutually exclusive: –P(A U B) = P(A) + P(B) Thus: P(not(A)) = P(A c ) = 1 – P(A) A B
Probabilistic Calculus P(A U B) = P(A) + P(B) – P(A ∩ B)
Conditional probability The joint probability of two events A and B P(A∩B), or simply P(A, B) is the probability that event A and B occur at the same time. The conditional probability of P(B|A) is the probability that B occurs given A occurred. P(A | B) = P(A ∩ B) / P(B)
Example Roll a die –If I tell you the number is less than 4 –What is the probability of an even number? P(d = even | d < 4) = P(d = even ∩ d < 4) / P(d < 4) P(d = 2) / P(d = 1, 2, or 3) = (1/6) / (3/6) = 1/3
Independence P(A | B) = P(A ∩ B) / P(B) => P(A ∩ B) = P(B) * P(A | B) A, B are independent iff –P(A ∩ B) = P(A) * P(B) –That is, P(A) = P(A | B) Also implies that P(B) = P(B | A) –P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A)
Examples Are P(d = even) and P(d < 4) independent? –P(d = even and d < 4) = 1/6 –P(d = even) = ½ –P(d < 4) = ½ –½ * ½ > 1/6 If your die actually has 8 faces, will P(d = even) and P(d < 5) be independent? Are P(even in first roll) and P(even in second roll) independent? Playing card, are the suit and rank independent?
Theorem of total probability Let B 1, B 2, …, B N be mutually exclusive events whose union equals the sample space S. We refer to these sets as a partition of S. An event A can be represented as: Since B 1, B 2, …, B N are mutually exclusive, then P(A) = P(A∩B 1 ) + P(A∩B 2 ) + … + P(A∩B N ) And therefore P(A) = P(A|B 1 )*P(B 1 ) + P(A|B 2 )*P(B 2 ) + … + P(A|B N )*P(B N ) = i P(A | B i ) * P(B i )
Example Row a loaded die, 50% time = 6, and 10% time for each 1 to 5 What’s the probability to have an even number? Prob(even) = Prob(even | d < 6) * Prob(d<6) + Prob(even | d=6) * Prob(d=6) = 2/5 * * 0.5 = 0.7
Another example We have a box of dies, 99% of them are fair, with 1/6 possibility for each face, 1% are loaded so that six comes up 50% of time. We pick up a die randomly and roll, what’s the probability we’ll have a six? P(six) = P(six | fair) * P(fair) + P(six | loaded) * P(loaded) –1/6 * * 0.01 = 0.17 > 1/6
Bayes theorem P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A) AP BP ABP )( )( )|( = => Posterior probability of A Normalizing constant BAP)|( Prior of B Likelihood This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics Bayes Theorem is definitely the fundamental relation in Statistical Pattern Recognition
Bayes theorem (cont’d) Given B 1, B 2, …, B N, a partition of the sample space S. Suppose that event A occurs; what is the probability of event B j ? P(B j | A) = P(A | B j ) * P(B j ) / P(A) = P(A | B j ) * P(B j ) / j P(A | B j )*P(B j ) B j : different models In the observation of A, should you choose a model that maximizes P(B j | A) or P(A | B j )? Depending on how much you know about B j !
Example Prosecutor’s fallacy –Some crime happened –The suspect did not leave any evidence, except some hair –The police got his DNA from his hair Some expert matched the DNA with that of a suspect –Expert said that both the false-positive and false negative rates are Can this be used as an evidence of guilty against the suspect?
Prosecutor’s fallacy Prob (match | innocent) = Prob (no match | guilty) = Prob (match | guilty) = ~ 1 Prob (no match | innocent) = ~ 1 Prob (guilty | match) = ?
Prosecutor’s fallacy P (g | m) = P (m | g) * P(g) / P (m) ~ P(g) / P(m) P(g): the probability for someone to be guilty with no other evidence P(m): the probability for a DNA match How to get these two numbers? –We don’t really care P(m) –We want to compare two models: P(g | m) and P(i | m)
Prosecutor’s fallacy P(i | m) = P(m | i) * P(i) / P(m) = * P(i) / P(m) Therefore P(i | m) / P(g | m) = * P(i) / P(g) P(i) + P(g) = 1 It is clear, therefore, that whether we can conclude the suspect is guilty depends on the prior probability P(i) How do you get P(i)?
Prosecutor’s fallacy How do you get P(i)? Depending on what other information you have on the suspect Say if the suspect has no other connection with the crime, and the overall crime rate is That’s a reasonable prior for P(g) P(g) = 10 -7, P(i) ~ 1 P(i | m) / P(g | m) = * P(i) / P(g) = /10 -7 = 10
P(observation | model1) / P(observation | model2): likelihood-ratio test LR test Often take logarithm: log (P(m|i) / P(m|i)) Log likelihood ratio (score) Or log odds ratio (score) Bayesian model selection: log (P(model1 | observation) / P(model2 | observation)) = LLR + log P(model1) - log P(model2)
Prosecutor’s fallacy P(i | m) / P(g | m) = /10 -7 = 10 Therefore, we would say the suspect is more likely to be innocent than guilty, given only the DNA samples We can also explicitly calculate P(i | m): P(m) = P(m|i)*P(i) + P(m|g)*P(g) = * * = 1.1 x P(i | m) = P(m | i) * P(i) / P(m) = 1 / 1.1 = 0.91
Prosecutor’s fallacy If you have other evidences, P(g) could be much larger than the average crime rate In that case, DNA test may give you higher confidence How to decide prior? –Subjective? –Important? –There are debates about Bayes statistics historically –Some strongly support, some strongly against –Growing interests in many fields However, no question about conditional probability If all priors are equally possible, decisions based on bayes inference and likelihood test are equivalent We use whichever is appropriate
Another example A test for a rare disease claims that it will report a positive result for 99.5% of people with the disease, and 99.9% of time of those without. The disease is present in the population at 1 in 100,000 What is P(disease | positive test)? What is P(disease | negative test)?
Yet another example We’ve talked about the boxes of casinos 99% fair, 1% loaded (50% at six) We said if we randomly pick a die and roll, we have 17% of chance to get a six If we get 3 six in a row, what’s the chance that the die is loaded? How about 5 six in a row?
P(loaded | 3 six in a row) = P(3 six in a row | loaded) * P(loaded) / P(3 six in a row) = 0.5^3 * 0.01 / (0.5^3 * (1/6)^3 * 0.99) = 0.21 P(loaded | 5 six in a row) = P(5 six in a row | loaded) * P(loaded) / P(5 six in a row) = 0.5^5 * 0.01 / (0.5^5 * (1/6)^5 * 0.99) = 0.71
Relation to multiple testing problem When searching a DNA sequence against a database, you get a high score, with a significant p-value P(unrelated | high score) / P(related | high score) = P(high score | unrelated) * P(unrelated) P(high score | related) * P(related) P(high score | unrelated) is much smaller than P(high score | related) But your database is huge, and most sequences should be unrelated, so P(unrelated) is much larger than P(related) Likelihood ratio
Question We’ve seen that given a sequence of observations, and two models, we can test which model is more likely to generate the data –Is the die loaded or fair? –Either likelihood test or Bayes inference Given a set of observations, and a model, can you estimate the parameters? –Given the results of rolling a die, how to infer the probability of each face?
Question You are told that there are two dice, one is loaded with 50% to be six, one is fair. Give you a series of numbers resulted from rolling the two dice Assume die switching is rare Can you tell which number is generated by which die?
Question You are told that there are two dice, one is loaded, one is fair. But you don’t know how it is loaded Give you a series of numbers resulted from rolling the two dice Assume die switching is rare Can you tell how is the die loaded and which number is generated by which die?