.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Basics of Statistical Estimation
Probabilistic models Haixu Tang School of Informatics.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Learning: Parameter Estimation
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Probabilities and Probabilistic Models
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
Maximum Likelihood-Maximum Entropy Duality : Session 1 Pushpak Bhattacharyya Scribed by Aditya Joshi Presented in NLP-AI talk on 14 th January, 2014.
Parameter Estimation using likelihood functions Tutorial #1
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Algorithms in Computational Biology – אלגוריתמים בביולוגיה חישובית – (
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Algorithms in Computational Biology – אלגוריתמים בביולוגיה חישובית – (
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Bayesian learning finalized (with high probability)
September SOME BASIC NOTIONS OF PROBABILITY THEORY Universita’ di Venezia 29 Settembre 2003.
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Class 3: Estimating Scoring Rules for Sequence Alignment.
Thanks to Nir Friedman, HU
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
Recitation 1 Probability Review
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Course Introduction What these courses are about What I expect What you can expect.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Maximum Likelihood Estimation
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
CS 2750: Machine Learning Linear Models for Classification Prof. Adriana Kovashka University of Pittsburgh February 15, 2016.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Conditional Expectation
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Probability and Probability Distributions. Probability Concepts Probability: –We now assume the population parameters are known and calculate the chances.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Applied statistics Usman Roshan.
Bayesian Estimation and Confidence Intervals
Parameter Estimation 主講人:虞台文.
Bayes Net Learning: Bayesian Approaches
Tutorial #3 by Ma’ayan Fishelson
Important Distinctions in Learning BNs
CS498-EA Reasoning in AI Lecture #20
Covered only ML estimator
Computing and Statistical Data Analysis / Stat 8
Statistical NLP: Lecture 4
Important Distinctions in Learning BNs
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
I flip a coin two times. What is the sample space?
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

.

. Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes made by Dan Geiger and Ydo Wexler.

3 Example: Binomial Experiment u When tossed, it can land in one of two positions: Head or Tail HeadTail  We denote by  the (unknown) probability P(H). Estimation task:  Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)=  and P(T) = 1 - 

4 Statistical Parameter Fitting  Consider instances x[1], x[2], …, x[M] such that l The set of values that x can take is known l Each is sampled from the same distribution l Each sampled independently of the rest i.i.d. samples u The task is to find a vector of parameters  that have generated the given data. This vector parameter  can be used to predict future data.

5 The Likelihood Function u How good is a particular  ? It depends on how likely it is to generate the observed data u The likelihood for the sequence H,T, T, H, H is  L()L()

6 Sufficient Statistics  To compute the likelihood in the thumbtack example we only require N H and N T (the number of heads and the number of tails)  N H and N T are sufficient statistics for the binomial distribution

7 Sufficient Statistics u A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood Datasets Statistics  Formally, s(D) is a sufficient statistics if for any two datasets D and D’ s(D) = s(D’ )  L D (  ) = L D’ (  )

8 Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function u This is one of the most commonly used estimators in statistics u Intuitively appealing  One usually maximizes the log-likelihood function defined as l D (  ) = log e L D (  )

9 Example: MLE in Binomial Data u Applying the MLE principle we get L()L() Example: (N H,N T ) = (3,2) MLE estimate is 3/5 = 0.6 (Which coincides with what one would expect)

10 From Binomial to Multinomial  For example, suppose X can have the values 1,2,…,K (For example a die has 6 sides)  We want to learn the parameters  1,  2. …,  K Sufficient statistics:  N 1, N 2, …, N K - the number of times each outcome is observed Likelihood function: MLE:

11 Example: Multinomial u Let be a protein sequence  We want to learn the parameters q 1, q 2,…, q 20 corresponding to the frequencies of the 20 amino acids  N 1, N 2, …, N 20 - the number of times each amino acid is observed in the sequence Likelihood function: MLE:

12 Is MLE all we need? u Suppose that after 10 observations, ML estimates P(H) = 0.7 for the thumbtack l Would you bet on heads for the next toss? Suppose now that after 10 observations, ML estimates P(H) = 0.7 for a coin Would you place the same bet? Solution: The Bayesian approach which incorporates your subjective prior knowledge. E.g., you may know a priori that some amino acids have the same frequencies and some have low frequencies. How would one use this information ?

13 Bayes’ rule Where Bayes’ rule: It hold because:

14 Example: Dishonest Casino u A casino uses 2 kind of dice: 99% are fair 1% is loaded: 6 comes up 50% of the times u We pick a die at random and roll it 3 times u We get 3 consecutive sixes  What is the probability the die is loaded?

15 Dishonest Casino (cont.) The solution is based on using Bayes rule and the fact that while P(loaded | 3sixes) is not known, the other three terms in Bayes rule are known, namely: u P(3sixes | loaded)=(0.5) 3 u P(loaded)=0.01 u P(3sixes) = P(3sixes | loaded) P(loaded)+P(3sixes | not loaded) (1-P(loaded))

16 Dishonest Casino (cont.)

17 Biological Example: Proteins u Extracellular proteins have a slightly different amino acid composition than intracellular proteins. u From a large enough protein database (SWISS-PROT), we can get the following: p(ext) - the probability that any new sequence is extracellular p(a i |int) - the frequency of amino acid a i for intracellular proteins p(a i |ext) - the frequency of amino acid a i for extracellular proteins p(int) - the probability that any new sequence is intracellular

18 Biological Example: Proteins (cont.) u What is the probability that a given new protein sequence x=x 1 x 2 ….x n is extracellular? u Assuming that every sequence is either extracellular or intracellular (but not both) we can write. u Thus,

19 Biological Example: Proteins (cont.) u Using conditional probability we get, The probabilities p(int), p(ext) are called the prior probabilities. The probability P(ext|x) is called the posterior probability. u By Bayes’ theorem