Maximum Likelihood Estimates and the EM Algorithms II

Slides:



Advertisements
Similar presentations
Advanced topics in Financial Econometrics Bas Werker Tilburg University, SAMSI fellow.
Advertisements

Fixed point iterations and solution of non-linear functions
Numerical Solution of Nonlinear Equations
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Bisection Method (Midpoint Method for Equations).
Bayesian Methods with Monte Carlo Markov Chains III
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Visual Recognition Tutorial
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Lecture 5: Learning models using EM
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Visual Recognition Tutorial
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
NUMERICAL METHODS WITH C++ PROGRAMMING
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Maximum likelihood (ML)
MATH 175: NUMERICAL ANALYSIS II Lecturer: Jomar Fajardo Rabajante IMSP, UPLB 2 nd Semester AY
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Solving Non-Linear Equations (Root Finding)
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
STATISTICAL INFERENCE PART I POINT ESTIMATION
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Multinomial Distribution
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Lecture 6 Numerical Analysis. Solution of Non-Linear Equations Chapter 2.
1 Part 6 Markov Chains. Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed.
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Numerical Methods.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
HMM - Part 2 The EM algorithm Continuous density HMM.
4 Numerical Methods Root Finding Secant Method Modified Secant
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Lecture 5 - Single Variable Problems CVEN 302 June 12, 2002.
Numerical Methods Solution of Equation.
Nonparametric Methods II 1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
4 Numerical Methods Root Finding Secant Method Modified Secant
1 Three examples of the EM algorithm Week 12, Lecture 1 Statistics 246, Spring 2002.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
SOLVING NONLINEAR EQUATIONS. SECANT METHOD MATH-415 Numerical Analysis 1.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Conditional Expectation
Lecture 4 Numerical Analysis. Solution of Non-Linear Equations Chapter 2.
Numerical Methods and Analysis
STATISTICAL INFERENCE PART I POINT ESTIMATION
Boundary-Value Problems for ODE )בעיות הגבול(
Solution of Equations by Iteration
Numerical Analysis Lecture 7.
EC 331 The Theory of and applications of Maximum Likelihood Method
MATH 175: Numerical Analysis II
SOLUTION OF NONLINEAR EQUATIONS
STATISTICAL INFERENCE PART I POINT ESTIMATION
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Solutions for Nonlinear Equations
Presentation transcript:

Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm

Part 1 Computation Tools

Include Functions in R source("file path") Example In MME.R: In R: MME=function(y1, y2, y3, y4) { n=y1+y2+y3+y4; phi1=4.0*(y1/n-0.5); phi2=1-4*y2/n; phi3=1-4*y3/n; phi4=4.0*y4/n; phi=(phi1+phi2+phi3+phi4)/4.0; print("By MME method") return(phi); # print(phi); } > source(“C:/MME.R”) > MME(125, 18, 20, 24) [1] "By MME method" [1] 0.5935829

Part 2 Motivation Examples

Example 1 in Genetics (1) Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab A B b a 1/2 a B b A 1/2

Example 1 in Genetics (2) Probabilities for genotypes in gametes No Recombination Recombination Male 1-r r Female 1-r’ r’ A B b a 1/2 a B b A 1/2 AB ab aB Ab Male (1-r)/2 r/2 Female (1-r’)/2 r’/2

Example 1 in Genetics (3) Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92. More: http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf

Example 1 in Genetics (4) MALE AB (1-r)/2 ab aB r/2 Ab F E M A L AABB (1-r) (1-r’)/4 aABb aABB r (1-r’)/4 AABb AaBb aabb aaBb Aabb r’/2 AaBB (1-r) r’/4 aabB aaBB r r’/4 AabB aAbb AAbb

Example 1 in Genetics (5) Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b*: the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations.

Example 1 in Genetics (6) Let , then

Example 1 in Genetics (7) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: We know that and So

Example 1 in Genetics (8) Suppose that we observe the data of which is a random sample from Then the probability mass function is

Maximum Likelihood Estimate (MLE) Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros. Under regular conditions, the MLE is consistent, asymptotic efficient and normal! More: http://en.wikipedia.org/wiki/Maximum_likelihood

MLE for Example 1 (1) Likelihood MLE:

MLE for Example 1 (2) A B C

MLE for Example 1 (3) Checking: 1. 2. 3. Compare ?

Part 3 Numerical Solutions for the Score Equations of MLEs

A Banach Space A Banach space is a vector space over the field such that Every Cauchy sequence of converges in (i.e., is complete). More: http://en.wikipedia.org/wiki/Banach_space

Lipschitz Continuous A closed subset and mapping More: is Lipschitz continuous on with if is a contraction mapping on if is Lipschitz continuous and More: http://en.wikipedia.org/wiki/Lipschitz_continuous

Fixed Point Theorem (1) If is a contraction mapping on if is Lipschitz continuous and has an unique fixed point such that initial

Fixed Point Theorem (2) More: http://en.wikipedia.org/wiki/Fixed-point_theorem http://www.math-linux.com/spip.php?article60

Applications for MLE (1) Numerical solution is a contraction mapping : initial value that Then

Applications for MLE (2) How to choose s.t. is a contraction mapping? Optimal ?

Parallel Chord Method (1) Parallel chord method is also called simple iteration.

Parallel Chord Method (2)

Plot Parallel Chord Method by R (1) ### Simple iteration ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Parallel chord method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0)

Plot Parallel Chord Method by R (2) phi0 = 0.25 # Given the initial value 0.25 # segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "red") phi1 = -b0/f2(phi0) # Find the closer phi # segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red")

Define Functions for Example 1 in R We will define some functions and variables for finding the MLE in Example 1 by R # Fist, second and third derivatives of log likelihood # f1 = function(y1, y2, y3, y4, phi){y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 = function(y1, y2, y3, y4, phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} f3 = function(y1, y2, y3, y4, phi) {2*y1*(2+phi)^(-3)-2*(y2+y3)*(1-phi)^(-3)+2*y4*(phi)^(-3)} # Fisher Information # I = function(y1, y2, y3, y4, phi) {(-1)*(y1+y2+y3+y4)*(1/(4*(2+phi))+1/(2*(1-phi))+1/(4*phi))} y1 = 125; y2 = 18; y3 = 20; y4 = 24; initial = 0.9

Parallel Chord Method by R (1) > fix(SimpleIteration) function(y1, y2, y3, y4, initial){ phi = NULL; i = 0; alpha = -1.0/f2(y1, y2, y3, y4, initial); phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-5){ i = i+1; phi1 = phi2; phi2 = alpha*f1(y1, y2, y3, y4, phi1)+phi1; phi[i] = phi2; } print("By parallel chord method(simple iteration)"); return(list(phi = phi2, iteration = phi));

Parallel Chord Method by R (2) > SimpleIteration(y1, y2, y3, y4, initial)

Parallel Chord Method by C/C++

Newton-Raphson Method (1) http://math.fullerton.edu/mathews/n2003/Newton'sMethodMod.html http://en.wikipedia.org/wiki/Newton%27s_method

Newton-Raphson Method (2)

Plot Newton-Raphson Method by R (1) ### Newton-Raphson Method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Newton-Raphson method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0)

Plot Newton-Raphson Method by R (2) # Given the initial value 0.25 # phi0 = 0.25 segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "purple", lwd = 2) # Find the closer phi # phi1 = -b0/f2(phi0) segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2)

Plot Newton-Raphson Method by R (3) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red") curve(f2(phi1)*x-f2(phi1)*phi1+f1(phi1), add = T, col = "blue", lwd = 2) legend(0.45, 250, c("Newton-Raphson", "Parallel chord method"), col = c("blue", "red"), lty = c(1, 1))

Newton-Raphson Method by R (1) > fix(Newton) function(y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Newton-Raphson method"); return (list(phi = phi2, iteration = phi));

Newton-Raphson Method by R (2) > Newton(125, 18, 20, 24, 0.9) [1] "By Newton-Raphson method" $phi [1] 0.577876 $iteration [1] 0.8193054 0.7068499 0.6092827 0.5792747 0.5778784 0.5778760 0.5778760

Newton-Raphson Method by C/C++

Halley’s Method The Newton-Raphson iteration function is It is possible to speed up convergence by using more expansion terms than the Newton-Raphson method does when the object function is very smooth, like the method by Edmond Halley (1656-1742): (http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html)

Halley’s Method by R (1) > fix(Halley) function( y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1)*1.0/(1.0-f1(y1, y2, y3, y4, phi1)*f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3, y4, phi1)*f2(y1, y2, y3, y4, phi1)*2.0)); phi[i] = phi2; } print("By Halley method"); return (list(phi = phi2, iteration = phi));

Halley’s Method by R (2) > Halley(125, 18, 20, 24, 0.9) [1] "By Halley method" $phi [1] 0.577876 $iteration [1] 0.5028639 0.5793692 0.5778760 0.5778760

Halley’s Method by C/C++

Bisection Method (1) Assume that  and that there exists a number such that . If  and have opposite signs, and represents the sequence of midpoints generated by the bisection process, then and the sequence converges to . That is,  . http://en.wikipedia.org/wiki/Bisection_method

Bisection Method (2) 1

Plot the Bisection Method by R ### Bisection method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 f <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} x = c(1:100)*0.01 y = f(x) plot(x, y, type = 'l', main = "Bisection method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0) abline(v = 0.5, col = "red") abline(v = 0.75, col = "red") text(0.49, 2200, labels = "1") text(0.74, 2200, labels = "2")

Bisection Method by R (1) > fix(Bisection) function(y1, y2, y3, y4, A, B) # A, B is the boundary of parameter # { Delta = 1.0E-6; # Tolerance for width of interval # Satisfied = 0; # Condition for loop termination # phi = NULL; YA = f1(y1, y2, y3, y4, A); # Compute function values # YB = f1(y1, y2, y3, y4, B); # Calculation of the maximum number of iterations # Max = as.integer(1+floor((log(B-A)-log(Delta))/log(2))); # Check to see if the bisection method applies # if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){ print("The values of function in boundary point do not differ in sign.");

Bisection Method by R (2) print("Therefore, this method is not appropriate here."); quit(); # Exit program # } for(K in 1:Max){ if(Satisfied == 1) break; C = (A+B)/2; # Midpoint of interval YC = f1(y1, y2, y3, y4, C); # Function value at midpoint # if((K-1) < 100) phi[K-1] = C; if(YC == 0){ A = C; # Exact root is found # B = C; else{ if((YB*YC) >= 0 ){

Bisection Method by R (3) B = C; # Squeeze from the right # YB = YC; } else{ A = C; # Squeeze from the left # YA = YC; if((B-A) < Delta) Satisfied = 1; # Check for early convergence # } # End of 'for'-loop # print("By Bisection Method"); return(list(phi = C, iteration = phi));

Bisection Method by R (4) > Bisection(125, 18, 20, 24, 0.25, 1) [1] "By Bisection Method" $phi [1] 0.5778754 $iteration [1] 0.4375000 0.5312500 0.5781250 0.5546875 0.5664062 0.5722656 0.5751953 [8] 0.5766602 0.5773926 0.5777588 0.5779419 0.5778503 0.5778961 0.5778732 [15] 0.5778847 0.5778790 0.5778761 0.5778747 0.5778754

Bisection Method by C/C++ (1)

Bisection Method by C/C++ (2)

Secant Method More: http://en.wikipedia.org/wiki/Secant_method http://math.fullerton.edu/mathews/n2003/SecantMethodMod.html

Secant Method by R (1) > fix(Secant) function(y1, y2, y3, y4, initial1, initial2){ phi = NULL; phi2 = initial1; phi1 = initial2; alpha = (phi2-phi1)/(f1(y1, y2, y3, y4, phi2)-f1(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); i = 0; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2;

Secant Method by R (2) alpha = (phi2-phi1)/(f1(y1, y2, y3, y4, phi2)-f1(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Secant method"); return (list(phi=phi2,iteration=phi));

Secant Method by R (3) > Secant(125, 18, 20, 24, 0.9, 0.05) [1] "By Secant method" $phi [1] 0.577876 $iteration [1] 0.2075634 0.4008018 0.5760396 0.5778801 0.5778760 0.5778760

Secant Method by C/C++

Secant-Bracket Method The secant-bracket method is also called the regular falsi method. S C A B

Secant-Bracket Method by R (1) > fix(RegularFalsi) function(y1, y2, y3, y4, A, B){ phi = NULL; i = -1; Delta = 1.0E-6; # Tolerance for width of interval # Satisfied = 1; # Condition for loop termination # # Endpoints of the interval [A,B] # YA = f1(y1, y2, y3, y4, A); # compute function values # YB = f1(y1, y2, y3, y4, B); # Check to see if the bisection method applies # if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){ print("The values of function in boundary point do not differ in sign"); print("Therefore, this method is not appropriate here"); q(); # Exit program # }

Secant-Bracket Method by R (2) while(Satisfied){ i = i+1; C = (B*f1(y1, y2, y3, y4, A)-A*f1(y1, y2, y3, y4, B))/(f1(y1, y2, y3, y4, A)-f1(y1, y2, y3, y4, B)); # Midpoint of interval # YC = f1(y1, y2, y3, y4, C); # Function value at midpoint # phi[i] = C; if(YC == 0){ # First 'if' # A = C; # Exact root is found # B = C; }else{ if((YB*YC) >= 0 ){ B = C; # Squeeze from the right # YB = YC;

Secant-Bracket Method by R (3) }else{ A = C; # Squeeze from the left # YA = YC; } if(f1(y1, y2, y3, y4, C) < Delta) Satisfied = 0; # Check for early convergence # print("By Regular Falsi Method") return(list(phi = C, iteration = phi));

Secant-Bracket Method by R (4) > RegularFalsi(y1, y2, y3, y4, 0.9, 0.05) [1] "By Regular Falsi Method" $phi [1] 0.577876 $iteration [1] 0.5758648 0.5765007 0.5769352 0.5772324 0.5774356 0.5775746 0.5776698 [8] 0.5777348 0.5777794 0.5778099 0.5778307 0.5778450 0.5778548 0.5778615 [15] 0.5778660 0.5778692 0.5778713 0.5778728 0.5778738 0.5778745 0.5778750 [22] 0.5778753 0.5778755 0.5778756 0.5778757 0.5778758 0.5778759 0.5778759 [29] 0.5778759 0.5778759 0.5778759 0.5778760 0.5778760 0.5778760 0.5778760 [36] 0.5778760 0.5778760

Secant-Bracket Method by C/C++ (1)

Secant-Bracket Method by C/C++ (2)

Fisher Scoring Method Fisher scoring method replaces by where is the Fisher information matrix when the parameter may be multivariate.

Fisher Scoring Method by R (1) > fix(Fisher) function(y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/I(y1, y2, y3, y4, phi1); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/I(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Fisher method"); return(list(phi = phi2, iteration = phi));

Fisher Scoring Method by R (2) > Fisher(125, 18, 20, 24, 0.9) [1] "By Fisher method" $phi [1] 0.577876 $iteration [1] 0.5907181 0.5785331 0.5779100 0.5778777 0.5778761 0.5778760

Fisher Scoring Method by C/C++

Order of Convergence Order of convergence is if and for . More: http://en.wikipedia.org/wiki/Order_of_convergence Note: as Hence, we can use regression to estimate

Theorem for Newton-Raphson Method If , is a contraction mapping then and If exists, has a simple zero, then such that of the Newton-Raphson method is a contraction mapping and .

Find Convergence Order by R (1) > # Coverage order # > # Newton method can be substitute for different method # > R = Newton(y1, y2, y3, y4, initial) [1] "By Newton-Raphson method" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x 0.6727 2.0441

Find Convergence Order by R (2) >R = SimpleIteration(y1, y2, y3, y4, initial) [1] "By parallel chord method(simple iteration)" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x -0.01651 1.01809 > R = Secant(y1, y2, y3, y4, initial, 0.05) [1] "By Secant method" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x -1.245 1.869 > R = Fisher(y1, y2, y3, y4, initial) [1] "By Fisher method" > temp=log(abs(R$iteration-R$phi)) > y=temp[2:(length(temp)-1)] > x=temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x -2.942 1.004 > R = Bisection(y1, y2, y3, y4, 0.25, 1) [1] "By Bisection Method" > temp=log(abs(R$iteration-R$phi)) > y=temp[2:(length(temp)-1)] > x=temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x -1.9803 0.8448

Find Convergence Order by R (3) > R = RegularFalsi(y1, y2, y3, y4, initial, 0.05) [1] "By Regular Falsi Method" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x -0.2415 1.0134

Find Convergence Order by C/C++

Exercises Write your own programs for those examples presented in this talk. Write programs for those examples mentioned at the following web page: http://en.wikipedia.org/wiki/Maximum_likelihood Write programs for the other examples that you know.

More Exercises (1) Example 3 in genetics: The observed data are where , , and fall in such that Find the MLEs for , , and .

More Exercises (2) Example 4 in the positron emission tomography (PET): The observed data are and The values of are known and the unknown parameters are . Find the MLEs for .

More Exercises (3) Example 5 in the normal mixture: The observed data are random samples from the following probability density function: Find the MLEs for the following parameters: