1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Slides:

Advertisements

Similar presentations

Numerical Solution of Nonlinear Equations

Advertisements

Maximum Likelihood Estimates and the EM Algorithms II

An Introduction to the EM Algorithm Naala Brewer and Kehinde Salau.

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.

Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Bayesian Methods with Monte Carlo Markov Chains III

DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation.

. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.

. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Visual Recognition Tutorial

Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Lecture 5: Learning models using EM

Maximum likelihood (ML)

. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.

A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.

Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Parametric Inference.

CASE STUDY: Genetic Linkage Analysis via Bayesian Networks

Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.

1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.

July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.

. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

Maximum likelihood (ML)

Maximum Likelihood Estimation

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

STATISTICAL INFERENCE PART I POINT ESTIMATION

Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )

A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.

1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

1 Nonparametric Methods II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

1 Part 6 Markov Chains. Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed.

1 Bayesian Methods with Monte Carlo Markov Chains I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

HMM - Part 2 The EM algorithm Continuous density HMM.

MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

Lecture 2: Statistical learning primer for biologists

Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.

Nonparametric Methods II 1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

1 Three examples of the EM algorithm Week 12, Lecture 1 Statistics 246, Spring 2002.

M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.

Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

Review of statistical modeling and probability theory Alan Moses ML4bio.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Conditional Expectation

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

STATISTICS POINT ESTIMATION

Classification of unlabeled data:

EM Algorithm 主講人：虞台文.

Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Presentation transcript:

1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

2 Part 1 Computation Tools

3 Include Functions in R  source( “ file path ” )  Example In MME.R: In R:

4 Part 2 Motivation Examples

5 Example 1 in Genetics (1)  Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive  A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab F ( Female) 1- r ’ r ’ (female recombination fraction) M (Male) 1-r r (male recombination fraction) A Bb a B A b a a B b A A B b a 5

6 Example 1 in Genetics (2)  r and r ’ are the recombination rates for male and female  Suppose the parental origin of these heterozygote is from the mating of. The problem is to estimate r and r ’ from the offspring of selfed heterozygotes.  Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79 – 92.  nk/handout12.pdf nk/handout12.pdf 6

7 Example 1 in Genetics (3) MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4 7

8 Example 1 in Genetics (4)  Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*.  A*: the dominant phenotype from (Aa, AA, aA).  a*: the recessive phenotype from aa.  B*: the dominant phenotype from (Bb, BB, bB).  b* : the recessive phenotype from bb.  A*B*: 9 gametic combinations.  A*b*: 3 gametic combinations.  a*B*: 3 gametic combinations.  a*b*: 1 gametic combination.  Total: 16 combinations. 8

9 Example 1 in Genetics (5) 9

10 Example 1 in Genetics (6) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 10

11 Example 1 in Genetics (7) Suppose that we observe the data of y = (y1, y2, y3, y4) = (125, 18, 20, 24), which is a random sample from Then the probability mass function is 11

12 Maximum Likelihood Estimate (MLE)  Likelihood:  Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros.  Under regular conditions, the MLE is consistent, asymptotic efficient and normal!  More: elihood 12

13 MLE for Example 1 (1)  Likelihood  MLE: A B C 13

14 MLE for Example 1 (2)  Checking: (1) (2) (3) 14

15 Part 3 Numerical Solutions for the Score Equations of MLEs

16 A Banach Space  A Banach space B is a vector space over the field K such that Every Cauchy sequence of B converges in B (i.e., B is complete). (

17 Lipschitz Continuous  A closed subset and mapping 1. F is Lipschitz continuous on A with if. 2. F is a contraction mapping on A if F is Lipschitz continuous and (

18 Fixed Point Theorem  If F is a contraction mapping on A if F is Lipschitz continuous and 1. F has an unique fixed point such that 2. initial, k=1,2, … 3. ( (

19 Applications for MLE (1)

20 Applications for MLE (2)   Optimal ?

21 Parallel Chord Method (1)  Parallel chord method is also called simple iteration. 

22 s Parallel Chord Method (2)

23 Plot the Parallel Chord Method by R

24 Define Functions for Example 1 in R  We will define some functions and variables for finding the MLE in Example 1 by R

25 Parallel Chord Method by R (1)

26 Parallel Chord Method by R (2)

27 Parallel Chord Method by C/C++

28   /Newton'sMethodMod.html /Newton'sMethodMod.html  _method _method Newton-Raphson Method (1)

29 s Newton-Raphson Method (2)

30 Plot the Newton-Raphson Method by R

31 Newton-Raphson Method by R (1)

32 Newton-Raphson Method by R (2)

33 Newton-Raphson Method by C/C++

34 Halley ’ s Method  The Newton-Raphson iteration function is  It is possible to speed up convergence by using more expansion terms than the Newton-Raphson method does when the object function is very smooth, like the method by Edmond Halley ( ): (

35 Halley ’ s Method by R (1)

36 Halley ’ s Method by R (2)

37 Halley ’ s Method by C/C++

38 Bisection Method (1)  Assume that and that there exists a number such that. If and have opposite signs, and represents the sequence of midpoints generated by the bisection process, then and the sequence converges to r.  That is,. ( )

39 1 Bisection Method (2)

40 Plot the Bisection Method by R

41 Bisection Method by R (1) > fix(Bisection)

42 Bisection Method by R (2)

43 Bisection Method by R (3)

44 Bisection Method by C/C++ (1)

45 Bisection Method by C/C++ (2)

46 Secant Method  ( ) ( MethodMod.html ) MethodMod.html

47 Secant Method by R (1) >fix(Secant)

48 Secant Method by R (2)

49 Secant Method by C/C++

50 Secant-Bracket Method  The secant-bracket method is also called the regular falsi method. S C A B

51 Secant-Bracket Method by R (1) >fix(RegularFalsi)

52 Secant-Bracket Method by R (2)

53 Secant-Bracket Method by R (3)

54 Secant-Bracket Method by C/C++ (1)

55 Secant-Bracket Method by C/C++ (1)

56 Fisher Scoring Method  Fisher scoring method replaces by where is the Fisher information matrix when the parameter may be multivariate.

57 Fisher Scoring Method by R (1) > fix(Fisher)

58 Fisher Scoring Method by R (2)

59 Fisher Scoring Method by C/C++

60 Order of Convergence  Order of convergence is p if and c<1 for p=1. ( Note: Hence, we can use regression to estimate p.

61 Theorem for Newton-Raphson Method  If, F is a contraction mapping then p=1 and  If exists, has a simple zero, then such that of the Newton-Raphson method is a contraction mapping and p=2.

62 Find Convergence Order by R (1) R=Newton(y1, y2, y3, y4, initial) #Newton method can be substitute for different method temp=log(abs(R$iteration-R$phi)); y=temp[2:(length(temp)-1)] x=temp[1:(length(temp)-2)] lm(y~x)

63 Find Convergence Order by R (2)

64 Find Convergence Order by R (3)

65 Find Convergence Order by C/C++

66 Exercises  Write your own programs for those examples presented in this talk.  Write programs for those examples mentioned at the following web page: kelihood  Write programs for the other examples that you know. 66

67 More Exercises (1)  Example 3 in genetics: The observed data are (nO, nA, nB, nAB) = (176, 182, 60, 17) ~ Multinomial(r^2, p^2+2pr, q^2+2qr, 2pq), where p, q, and r fall in [0,1] such that p+q+r = 1. Find the MLEs for p, q, and r. 67

68 More Exercises (2)  Example 4 in the positron emission tomography (PET): The observed data are n*(d) ~Poisson(λ*(d)), d = 1, 2, …, D, and  The values of p(b,d) are known and the unknown parameters are λ(b), b = 1, 2, …, B.  Find the MLEs for λ(b), b = 1, 2, …, B.. 68

69 More Exercises (3)  Example 5 in the normal mixture: The observed data x i, i = 1, 2, …, n, are random samples from the following probability density function:  Find the MLEs for the following parameters: 69