CS5263 Bioinformatics Lecture 9: Motif finding Biological & Statistical background.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Primer on Probability Sushmita Roy BMI/CS 576 Sushmita Roy Oct 2nd, 2012 BMI/CS 576.
Probabilistic models Haixu Tang School of Informatics.
Scores and substitution matrices in sequence alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 11 th,
Chapter 4 Probability and Probability Distributions
COUNTING AND PROBABILITY
Economics 105: Statistics Any questions? Go over GH2 Student Information Sheet.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
NIPRL Chapter 1. Probability Theory 1.1 Probabilities 1.2 Events 1.3 Combinations of Events 1.4 Conditional Probability 1.5 Probabilities of Event Intersections.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Parameter Estimation using likelihood functions Tutorial #1
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Chapter 4 Probability.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Chapter 2: Probability.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Expected Value (Mean), Variance, Independence Transformations of Random Variables Last Time:
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Chapter 1 Basics of Probability.
CS 6293 Advanced Topics: Translational Bioinformatics Intro & Ch2 - Data-Driven View of Disease Biology Jianhua Ruan.
Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.
5.3B Conditional Probability and Independence Multiplication Rule for Independent Events AP Statistics.
Random Sampling, Point Estimation and Maximum Likelihood.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Topic 2 – Probability Basic probability Conditional probability and independence Bayes rule Basic reliability.
1 Chapters 6-8. UNIT 2 VOCABULARY – Chap 6 2 ( 2) THE NOTATION “P” REPRESENTS THE TRUE PROBABILITY OF AN EVENT HAPPENING, ACCORDING TO AN IDEAL DISTRIBUTION.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
 Basic Concepts in Probability  Basic Probability Rules  Connecting Probability to Sampling.
LECTURE IV Random Variables and Probability Distributions I.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Probability theory Petter Mostad Sample space The set of possible outcomes you consider for the problem you look at You subdivide into different.
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
CS5263 Bioinformatics Lecture 10: Markov Chain and Hidden Markov Models.
Computer Performance Modeling Dirk Grunwald Spring ‘96 Jain, Chapter 12 Summarizing Data With Statistics.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Computing Fundamentals 2 Lecture 6 Probability Lecturer: Patrick Browne
Topic 2: Intro to probability CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering.
1 Naïve Bayes Classification CS 6243 Machine Learning Modified from the slides by Dr. Raymond J. Mooney
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Probability Review-1 Probability Review. Probability Review-2 Probability Theory Mathematical description of relationships or occurrences that cannot.
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Probability. Probability Probability is fundamental to scientific inference Probability is fundamental to scientific inference Deterministic vs. Probabilistic.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
AP Statistics From Randomness to Probability Chapter 14.
Chapter5 Statistical and probabilistic concepts, Implementation to Insurance Subjects of the Unit 1.Counting 2.Probability concepts 3.Random Variables.
Probability and Probability Distributions. Probability Concepts Probability: –We now assume the population parameters are known and calculate the chances.
Lecture 1.31 Criteria for optimal reception of radio signals.
Review of Probability and Estimators Arun Das, Jason Rebello
Advanced Artificial Intelligence
Bayes for Beginners Luca Chech and Jolanda Malamud
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Presentation transcript:

CS5263 Bioinformatics Lecture 9: Motif finding Biological & Statistical background

Roadmap Review of last lecture Intro to probability and statistics Intro to motif finding problems –Biological background

Multiple Sequence Alignment

Scoring functions Ideally: –Maximizes probability that sequences evolved from common ancestor In practice: –Sum of Pairs x y z w v ? x:AC-GCGG-C y:AC-GC-GAG z:GCCGC-GAG x: ACGCGG-C x: AC-GCGG-C; y: AC-GCGAG y: ACGC-GAC z: GCCGC-GAG; z: GCCGCGAG

Algorithms MDP Progressive alignment Iterative refinement Restricted DP

MDP Similar to pair-wise alignment –O(2 N L N ) running time –O(L N ) memory F(i-1,j-1,k-1) + S(x i, x j, x k ), F(i-1,j-1,k ) + S(x i, x j, -), F(i-1,j,k-1) + S(x i, -, x k ), F(i,j,k) = max F(i,j-1,k-1) + S(-, x j, x k ), F(i-1,j,k ) + S(x i, -, -), F(i,j-1,k ) + S(-, x j, -), F(i,j,k-1) + S(-, -, x k ) (i,j,k) (i,j,k-1) (i-1,j,k-1) (i-1,j-1,k-1) (i-1,j-1,k) (i,j-1,k) (i-1,j,k) (i,j-1,k-1)

Progressive alignment Most popular multiple alignment algorithm –CLUSTALW Main idea: –Construct a guide tree based on pair-wise alignment scores –Align the most similar sequences first –Progressively add other sequences Pros: fast (O(NL 2 ) Cons: initial bad alignment is frozen

Iterative Refinement Basic idea: –Do progressive alignment first –Iteratively: Remove a sequence, and realign it back while keeping the rest fixed A note of its convergence guarantee –Every time we realign a sequence, we improve its score –Therefore, the algorithm must converge to either a global or local maximum

Restricted MDP Similar to bounded DP in pair-wise alignment 1.Construct progressive multiple alignment m 2.Run MDP, restricted to radius R from m Running Time: O(2 N R N-1 L) x y z

Today Probability and statistics Biology background for motif finding

Probability Basics Definition (informal) –Probabilities are numbers assigned to events that indicate “how likely” it is that the event will occur when a random experiment is performed –A probability law for a random experiment is a rule that assigns probabilities to the events in the experiment –The sample space S of a random experiment is the set of all possible outcomes

Example 0  P(A i )  1 P(S) = 1

Random variable A random variable is a function from a sample to the space of possible values of the variable –When we toss a coin, the number of times that we see heads is a random variable –Can be discrete or continuous The resulting number after rolling a die The weight of an individual

Cumulative distribution function (cdf) The cumulative distribution function F X (x) of a random variable X is defined as the probability of the event {X≤x} F (x) = P(X ≤ x) for −∞ < x < +∞

Probability density function (pdf) The probability density function of a continuous random variable X, if it exists, is defined as the derivative of F X (x) For discrete random variables, the equivalent to the pdf is the probability mass function (pmf):

Probability density function vs probability What is the probability for somebody weighting 200lb? The figure shows about 0.62 –What is the probability of lb? The right question would be: –What’s the probability for somebody weighting lb. The probability mass function is true probability –The chance to get any face is 1/6

Some common distributions Discrete: –Binomial –Multinomial –Geometric –Hypergeometric –Possion Continuous –Normal (Gaussian) –Uniform –EVD –Gamma –Beta –…

Probabilistic Calculus If A, B are mutually exclusive: –P(A U B) = P(A) + P(B) Thus: P(not(A)) = P(A c ) = 1 – P(A) A B

Probabilistic Calculus P(A U B) = P(A) + P(B) – P(A ∩ B)

Conditional probability The joint probability of two events A and B P(A∩B), or simply P(A, B) is the probability that event A and B occur at the same time. The conditional probability of P(B|A) is the probability that B occurs given A occurred. P(A | B) = P(A ∩ B) / P(B)

Example Roll a die –If I tell you the number is less than 4 –What is the probability of an even number? P(d = even | d < 4) = P(d = even ∩ d < 4) / P(d < 4) P(d = 2) / P(d = 1, 2, or 3) = (1/6) / (3/6) = 1/3

Independence P(A | B) = P(A ∩ B) / P(B) => P(A ∩ B) = P(B) * P(A | B) A, B are independent iff –P(A ∩ B) = P(A) * P(B) –That is, P(A) = P(A | B) Also implies that P(B) = P(B | A) –P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A)

Examples Are P(d = even) and P(d < 4) independent? –P(d = even and d < 4) = 1/6 –P(d = even) = ½ –P(d < 4) = ½ –½ * ½ > 1/6 If your die actually has 8 faces, will P(d = even) and P(d < 5) be independent? Are P(even in first roll) and P(even in second roll) independent? Playing card, are the suit and rank independent?

Theorem of total probability Let B 1, B 2, …, B N be mutually exclusive events whose union equals the sample space S. We refer to these sets as a partition of S. An event A can be represented as: Since B 1, B 2, …, B N are mutually exclusive, then P(A) = P(A∩B 1 ) + P(A∩B 2 ) + … + P(A∩B N ) And therefore P(A) = P(A|B 1 )*P(B 1 ) + P(A|B 2 )*P(B 2 ) + … + P(A|B N )*P(B N ) =  i P(A | B i ) * P(B i )

Example Row a loaded die, 50% time = 6, and 10% time for each 1 to 5 What’s the probability to have an even number? Prob(even) = Prob(even | d < 6) * Prob(d<6) + Prob(even | d=6) * Prob(d=6) = 2/5 * * 0.5 = 0.7

Another example We have a box of dies, 99% of them are fair, with 1/6 possibility for each face, 1% are loaded so that six comes up 50% of time. We pick up a die randomly and roll, what’s the probability we’ll have a six? P(six) = P(six | fair) * P(fair) + P(six | loaded) * P(loaded) –1/6 * * 0.01 = 0.17 > 1/6

Bayes theorem P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A) AP BP ABP )( )( )|( = => Posterior probability of A Normalizing constant BAP)|( Prior of B Likelihood This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics Bayes Theorem is definitely the fundamental relation in Statistical Pattern Recognition

Bayes theorem (cont’d) Given B 1, B 2, …, B N, a partition of the sample space S. Suppose that event A occurs; what is the probability of event B j ? P(B j | A) = P(A | B j ) * P(B j ) / P(A) = P(A | B j ) * P(B j ) /  j P(A | B j )*P(B j ) B j : different models In the observation of A, should you choose a model that maximizes P(B j | A) or P(A | B j )? Depending on how much you know about B j !

Example Prosecutor’s fallacy –Some crime happened –The suspect did not leave any evidence, except some hair –The police got his DNA from his hair Some expert matched the DNA with that of a suspect –Expert said that both the false-positive and false negative rates are Can this be used as an evidence of guilty against the suspect?

Prosecutor’s fallacy Prob (match | innocent) = Prob (no match | guilty) = Prob (match | guilty) = ~ 1 Prob (no match | innocent) = ~ 1 Prob (guilty | match) = ?

Prosecutor’s fallacy P (g | m) = P (m | g) * P(g) / P (m) ~ P(g) / P(m) P(g): the probability for someone to be guilty with no other evidence P(m): the probability for a DNA match How to get these two numbers? –We don’t really care P(m) –We want to compare two models: P(g | m) and P(i | m)

Prosecutor’s fallacy P(i | m) = P(m | i) * P(i) / P(m) = * P(i) / P(m) Therefore P(i | m) / P(g | m) = * P(i) / P(g) P(i) + P(g) = 1 It is clear, therefore, that whether we can conclude the suspect is guilty depends on the prior probability P(i) How do you get P(i)?

Prosecutor’s fallacy How do you get P(i)? Depending on what other information you have on the suspect Say if the suspect has no other connection with the crime, and the overall crime rate is That’s a reasonable prior for P(g) P(g) = 10 -7, P(i) ~ 1 P(i | m) / P(g | m) = * P(i) / P(g) = /10 -7 = 10

P(observation | model1) / P(observation | model2): likelihood-ratio test LR test Often take logarithm: log (P(m|i) / P(m|i)) Log likelihood ratio (score) Or log odds ratio (score) Bayesian model selection: log (P(model1 | observation) / P(model2 | observation)) = LLR + log P(model1) - log P(model2)

Prosecutor’s fallacy P(i | m) / P(g | m) = /10 -7 = 10 Therefore, we would say the suspect is more likely to be innocent than guilty, given only the DNA samples We can also explicitly calculate P(i | m): P(m) = P(m|i)*P(i) + P(m|g)*P(g) = * * = 1.1 x P(i | m) = P(m | i) * P(i) / P(m) = 1 / 1.1 = 0.91

Prosecutor’s fallacy If you have other evidences, P(g) could be much larger than the average crime rate In that case, DNA test may give you higher confidence How to decide prior? –Subjective? –Important? –There are debates about Bayes statistics historically –Some strongly support, some strongly against –Growing interests in many fields However, no question about conditional probability If all priors are equally possible, decisions based on bayes inference and likelihood test are equivalent We use whichever is appropriate

Another example A test for a rare disease claims that it will report a positive result for 99.5% of people with the disease, and 99.9% of time of those without. The disease is present in the population at 1 in 100,000 What is P(disease | positive test)? What is P(disease | negative test)?

Yet another example We’ve talked about the boxes of casinos 99% fair, 1% loaded (50% at six) We said if we randomly pick a die and roll, we have 17% of chance to get a six If we get 3 six in a row, what’s the chance that the die is loaded? How about 5 six in a row?

P(loaded | 3 six in a row) = P(3 six in a row | loaded) * P(loaded) / P(3 six in a row) = 0.5^3 * 0.01 / (0.5^3 * (1/6)^3 * 0.99) = 0.21 P(loaded | 5 six in a row) = P(5 six in a row | loaded) * P(loaded) / P(5 six in a row) = 0.5^5 * 0.01 / (0.5^5 * (1/6)^5 * 0.99) = 0.71

Relation to multiple testing problem When searching a DNA sequence against a database, you get a high score, with a significant p-value P(unrelated | high score) / P(related | high score) = P(high score | unrelated) * P(unrelated) P(high score | related) * P(related) P(high score | unrelated) is much smaller than P(high score | related) But your database is huge, and most sequences should be unrelated, so P(unrelated) is much larger than P(related) Likelihood ratio

Question We’ve seen that given a sequence of observations, and two models, we can test which model is more likely to generate the data –Is the die loaded or fair? –Either likelihood test or Bayes inference Given a set of observations, and a model, can you estimate the parameters? –Given the results of rolling a die, how to infer the probability of each face?

Question You are told that there are two dice, one is loaded with 50% to be six, one is fair. Give you a series of numbers resulted from rolling the two dice Assume die switching is rare Can you tell which number is generated by which die?

Question You are told that there are two dice, one is loaded, one is fair. But you don’t know how it is loaded Give you a series of numbers resulted from rolling the two dice Assume die switching is rare Can you tell how is the die loaded and which number is generated by which die?