Presentation is loading. Please wait.

Presentation is loading. Please wait.

This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes.

Similar presentations


Presentation on theme: "This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes."— Presentation transcript:

1 This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes made by Dan Geiger, Ydo Wexler and Ma’ayan Fishelson.www.cs.huji.ac.il/~pmai Tutorial #3 by Ma’ayan Fishelson

2 Example: Binomial Experiment When tossed, it can land in one of two positions: Head or Tail. We denote by  the (unknown) probability P(H). HeadTail Estimation task: Given a sequence of toss samples x[1], x[2],…,x[M] we want to estimate the probabilities P(H) =  and P(T) = 1 - .

3 Statistical Parameter Fitting Consider instances x[1], x[2], …, x[M] such that: –The set of values x can take is known. –Each one is sampled from the same distribution. –Each one is sampled independently of the rest. The task is to find the vector of parameters  that have generated the given data. This parameter vector  can be used to predict future data. i.i.d. samples

4 The Likelihood Function How good is a particular  ? It depends on how likely it is to generate the observed data: The likelihood for the sequence H,T, T, H, H is: 00.20.40.60.81  L()L()

5 Sufficient Statistics To compute the likelihood in the thumbtack example we only require N H and N T (the number of heads and the number of tails). N H and N T are sufficient statistics for the binomial distribution.

6 Sufficient Statistics A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood. Formally, s(D) is a sufficient statistics if for any two datasets D and D’: s(D) = s(D’ )  L D (  ) = L D’ (  ) Datasets Statistics

7 Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function One of the most commonly used estimators in statistics. Intuitively appealing. One usually maximizes the log-likelihood function defined as l D (  ) = log e L D (  ).

8 Example: MLE in Binomial Data Applying the MLE principle we get 00.20.40.60.81 L()L() Example: (N H,N T ) = (3,2) MLE estimate is 3/5 = 0.6 ( Which coincides with what one would expect)

9 From Binomial to Multinomial For example, suppose that X can have the values 1,2,…,K (for example, a die has 6 faces). We want to learn the parameters  1,  2. …,  K. Sufficient statistics: N 1, N 2, …, N K - the number of times each outcome is observed. Likelihood function: MLE:

10 Example: Multinomial Let x 1, x 2,…,x n be a protein sequence. We want to learn the parameters q 1, q 2,…,q 20 corresponding to the frequencies of the 20 amino-acids. Let N 1, N 2,…,N 20 be the number of times each amino-acid is observed in the sequence. Likelihood function : MLE:

11 Is MLE all we need? Suppose that after 10 observations, –ML estimate is P(H) = 0.7 for the thumbtack. –Would you bet on heads for the next toss? Suppose now that after 10 observations, – ML estimate is P(H) = 0.7 for a coin. – Would you place the same bet? Solution: The Bayesian approach that incorporates your subjective prior knowledge. E.g., you may know a priori that some amino acids have high frequencies and some have low frequencies. How would one use this information ?

12 Bayes’ rule where, Bayes’ rule: It holds because:

13 Example: Dishonest Casino A casino uses 2 kind of dice: 99% are fair. 1% is loaded: 6 comes up 50% of the times We pick a die at random and roll it 3 times. We get 3 consecutive sixes.  What is the probability the die is loaded?

14 Dishonest Casino (2) The solution is based on using Bayes rule and the fact that while P(loaded | 3 sixes) is not known, the other three terms in Bayes rule are known, namely: P(3 sixes | loaded)=(0.5) 3 P(loaded)=0.01 P(3 sixes) = P(3 sixes|loaded) P(loaded)+P(3 sixes|fair) P(fair)

15 Dishonest Casino (3)

16 Biological Example: Proteins Extra-cellular proteins have a slightly different amino acid composition than intra-cellular proteins. From a large enough protein database (SWISS-PROT), we can get the following: p(ext) - the probability that any new sequence is extra-cellular p(a i |int) - the frequency of amino acid a i for intra-cellular proteins p(a i |ext) - the frequency of amino acid a i for extra-cellular proteins p(int) - the probability that any new sequence is intra-cellular

17 Biological Example: Proteins (2) Q: What is the probability that a given new protein sequence x=x 1 x 2 ….x n is extra-cellular? A: Assuming that every sequence is either extra-cellular or intra-cellular (but not both), we can write:. Thus,

18 Biological Example: Proteins (3) Using conditional probability we get, The probabilities p(int), p(ext) are called the prior probabilities. The probability P(ext|x) is called the posterior probability. By Bayes’ theorem


Download ppt "This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes."

Similar presentations


Ads by Google