This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes.

This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes made by Dan Geiger, Ydo Wexler and Ma’ayan Fishelson.www.cs.huji.ac.il/~pmai Tutorial #3 by Ma’ayan Fishelson

Example: Binomial Experiment When tossed, it can land in one of two positions: Head or Tail. We denote by  the (unknown) probability P(H). HeadTail Estimation task: Given a sequence of toss samples x[1], x[2],…,x[M] we want to estimate the probabilities P(H) =  and P(T) = 1 - .

Statistical Parameter Fitting Consider instances x[1], x[2], …, x[M] such that: –The set of values x can take is known. –Each one is sampled from the same distribution. –Each one is sampled independently of the rest. The task is to find the vector of parameters  that have generated the given data. This parameter vector  can be used to predict future data. i.i.d. samples

The Likelihood Function How good is a particular  ? It depends on how likely it is to generate the observed data: The likelihood for the sequence H,T, T, H, H is: 00.20.40.60.81  L()L()

Sufficient Statistics To compute the likelihood in the thumbtack example we only require N H and N T (the number of heads and the number of tails). N H and N T are sufficient statistics for the binomial distribution.

Sufficient Statistics A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood. Formally, s(D) is a sufficient statistics if for any two datasets D and D’: s(D) = s(D’ )  L D (  ) = L D’ (  ) Datasets Statistics

Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function One of the most commonly used estimators in statistics. Intuitively appealing. One usually maximizes the log-likelihood function defined as l D (  ) = log e L D (  ).

Example: MLE in Binomial Data Applying the MLE principle we get 00.20.40.60.81 L()L() Example: (N H,N T ) = (3,2) MLE estimate is 3/5 = 0.6 ( Which coincides with what one would expect)

From Binomial to Multinomial For example, suppose that X can have the values 1,2,…,K (for example, a die has 6 faces). We want to learn the parameters  1,  2. …,  K. Sufficient statistics: N 1, N 2, …, N K - the number of times each outcome is observed. Likelihood function: MLE:

Example: Multinomial Let x 1, x 2,…,x n be a protein sequence. We want to learn the parameters q 1, q 2,…,q 20 corresponding to the frequencies of the 20 amino-acids. Let N 1, N 2,…,N 20 be the number of times each amino-acid is observed in the sequence. Likelihood function : MLE:

Is MLE all we need? Suppose that after 10 observations, –ML estimate is P(H) = 0.7 for the thumbtack. –Would you bet on heads for the next toss? Suppose now that after 10 observations, – ML estimate is P(H) = 0.7 for a coin. – Would you place the same bet? Solution: The Bayesian approach that incorporates your subjective prior knowledge. E.g., you may know a priori that some amino acids have high frequencies and some have low frequencies. How would one use this information ?

Bayes’ rule where, Bayes’ rule: It holds because:

Example: Dishonest Casino A casino uses 2 kind of dice: 99% are fair. 1% is loaded: 6 comes up 50% of the times We pick a die at random and roll it 3 times. We get 3 consecutive sixes.  What is the probability the die is loaded?

Dishonest Casino (2) The solution is based on using Bayes rule and the fact that while P(loaded | 3 sixes) is not known, the other three terms in Bayes rule are known, namely: P(3 sixes | loaded)=(0.5) 3 P(loaded)=0.01 P(3 sixes) = P(3 sixes|loaded) P(loaded)+P(3 sixes|fair) P(fair)

Dishonest Casino (3)

Biological Example: Proteins Extra-cellular proteins have a slightly different amino acid composition than intra-cellular proteins. From a large enough protein database (SWISS-PROT), we can get the following: p(ext) - the probability that any new sequence is extra-cellular p(a i |int) - the frequency of amino acid a i for intra-cellular proteins p(a i |ext) - the frequency of amino acid a i for extra-cellular proteins p(int) - the probability that any new sequence is intra-cellular

Biological Example: Proteins (2) Q: What is the probability that a given new protein sequence x=x 1 x 2 ….x n is extra-cellular? A: Assuming that every sequence is either extra-cellular or intra-cellular (but not both), we can write:. Thus,

Biological Example: Proteins (3) Using conditional probability we get, The probabilities p(int), p(ext) are called the prior probabilities. The probability P(ext|x) is called the posterior probability. By Bayes’ theorem

This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes.

Similar presentations

Presentation on theme: "This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes.

Similar presentations

Presentation on theme: "This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes."— Presentation transcript:

Similar presentations

About project

Feedback