Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1

Part 4 The EM Algorithm 2

EM Algorithm (1)  Dempster, Little, Rubin (1977, JRSSB) [The proof has some error.]  Wu (1983, AS) [Correct the proof.]  Books and reference for the EM Algorithm: McLachland and Krishnan (1997), The EM Algorithm and Extensions, Wiley, New York. http://www2.isye.gatech.edu/~brani/isye bayes/bank/handout12.pdf 3

EM Algorithm (2)  The EM algorithm is used to maximize complex likelihoods for incomplete data problems.  Let y be the observed data from a pdf of, where is a vector of parameters. Let x = [y, z] be a vector of complete data with the augmented data z. 4

EM Algorithm (3)  The incomplete data vector y comes from the incomplete sample space y. There is a 1-1 correspondence between the complete sample space and the incomplete sample space.  Let be some initial value for. At the k-th step, the EM algorithm performs the following two steps: 5

EM Algorithm (4) 1. E-step: Projecting an appropriate functional containing the complete data on the space of the incomplete data. Calculate 2. M-step: Maximizing the functional evaluated in the E-step. Choose the value that maximizes 6

EM Algorithm (5)  The E and M steps are iterated until the difference of becomes small. 7

8 Example 1 in Genetics (1)  Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive  A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab F ( Female) 1- r ’ r ’ (female recombination fraction) M (Male) 1-r r (male recombination fraction) A Bb a B A b a a B b A A B b a 8

9 Example 1 in Genetics (2)  r and r ’ are the recombination rates for male and female  Suppose the parental origin of these heterozygote is from the mating of. The problem is to estimate r and r ’ from the offspring of selfed heterozygotes.  Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79 – 92.  http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/b ank/handout12.pdf http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/b ank/handout12.pdf 9

10 Example 1 in Genetics (3) MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4 10

11 Example 1 in Genetics (4)  Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*.  A*: the dominant phenotype from (Aa, AA, aA).  a*: the recessive phenotype from aa.  B*: the dominant phenotype from (Bb, BB, bB).  b* : the recessive phenotype from bb.  A*B*: 9 gametic combinations.  A*b*: 3 gametic combinations.  a*B*: 3 gametic combinations.  a*b*: 1 gametic combination.  Total: 16 combinations. 11

12 Example 1 in Genetics (5) 12

13 Example 1 in Genetics (6) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 13

EM Algorithm for Example 1 (1)  Incomplete data  Complete data  14

EM Algorithm for Example 1 (2)  Define  Then 15

EM Algorithm for Example 1 (2) 16

EM Algorithm for Example 1 (3) 17

Lemma (1) Proof: 18

Lemma (2) Not that the above equality holds if and only if almost everywhere. Q.E.D. 19

Expectation Step  Let = old iteration = new iteration 20

Maximization Step 21

EM Algorithm for Example 1 (4) 1. 2. (E step) 3. (M step) 4. Else, and go to step 2. 22

Question 23

Theorem (1) Proof: i.e., 24

Theorem (2)  Therefore, the MLE-EM will be monotonically convergent under regular conditions. 25

MLE-EM by R (1) >fix(EM) 26

MLE-EM by R (2) 27

MLE-EM by C/C++ 28

Result: (Initial Value=0.5) 29

Result: (Initial Value=1) overfloat 32

Part 5 High Dimension Cases 33

High Dimension Optimization (1) 1. and local optima: parallel chord, Newton-Raphson, Fisher Scoring, Bisection … 2. is non-differentiable and local optima: Grid search Linear search Golden section search MLE+search … 34

High Dimension Optimization (2) 3. Global optima: Grid search Simulation annealing (MCMC, MH algorithm) … 35

High Dimension Optimization in R (1)  optim(initial value, function, method, lower, upper)  Methods: "BFGS" is a quasi-Newton method. This uses function values and gradients to build up a picture of the surface to be optimized. "CG" is a conjugate gradients method based on that by Fletcher and Reeves "L-BFGS-B" is that of Byrd et. al. (1995) which allows box constraints, that is each variable can be given a lower/ upper bound. The initial value must satisfy the constraints. 36

High Dimension Optimization in R (2) “ SANN ” belongs to the class of stochastic global optimization methods. It uses only function values but is relatively slow. It will also work for non-differentiable functions.  lower, upper: Bounds on the variables for the "L-BFGS-B" method.  See more details in help document. 37

Demo of optim( ) in R fr <- function(x) ## Rosenbrock Banana function { x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } optim(c(-1.2,1), fr, method = "BFGS") The best set of parameters found. The value of fn corresponding to par. 0 indicates successful convergence 38

optim() vs. optimize() in one dimension

Part 6 Exercises of MLE-EMs 40

EM for Example 3 (1)  Example 3 in genetics: The observed data are (n O, n A, n B, n AB ) = (176, 182, 60, 17) ~ Multinomial(r 2, p 2 +2pr, q 2 +2qr, 2pq), where p, q, and r fall in [0,1] such that p+q+r = 1. Find the likelihood function and score equations for p, q, and r. 41 Category (Cell)Cell ProbabilityObserved Frequency Or2r2 n O =176 Ap 2 +2prn A =182 Bq 2 +2qrn B =60 AB2pqn AB =17 41

EM for Example 3 (2)  Incomplete data let 42

EM for Example 3 (3)  Complete data Category (Cell)Cell ProbabilityObserved Frequency OOr2r2 nOnO AAp2p2 n AA AO2qrn AO BBq2q2 n BB BO2qrn BO AB2pqn AB 43

EM for Example 3 (4)  where , 44

EM for Example 3 (5)  Similarly, one can obtain the current conditional expectations of  Execution of the M-step gives 45

EM for Example 3 by R (1) 46

EM for Example 4 (1)  In the positron emission tomography (PET): The observed data are n*(d) ~Poisson(λ*(d)), d = 1, 2, …, D, and  The values of p(b,d) are known and the unknown parameters are λ(b), b = 1, 2, …, B.  Find the likelihood function and score equations for λ(b), b = 1, 2, …, B.  http://en.wikipedia.org/wiki/Positron_emi ssion_tomography http://en.wikipedia.org/wiki/Positron_emi ssion_tomography. 48

EM for Example 4 (2)  Incomplete Model:  Complete Model:  Complete log-likelihood function: 49

 E-step  M-step EM for Example 4 (3) 50

EM for Example 5 (1)  In the normal mixture (http://en.wikipedia.org/wiki/Mixture_model): The observed data x i, i=1, 2, …, n, are random samples from the following probability density function:http://en.wikipedia.org/wiki/Mixture_model 51

EM for Example 5 (2)  and where  52

EM for Example 5 (3)     53

EM for Example 5 (4)  Maximize Q with subject to the constraint. Apply the Lagrange method on Q, it turns to maximize  The estimators of are 54

EM for Example 5 by R (1)  normalmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE)  Argument x: A vector of length n consisting of the data lambda: Initial value of mixing proportions. mu: A k-vector of initial values for the mean parameters. Sigma: A k-vector of initial values for the standard deviation parameters. 55

EM for Example 5 by R (2) k: Number of components. arbmean: If TRUE, then the component densities are allowed to have different mus. If FALSE, then a scale mixture will be fit. Arbvar: If TRUE, then the component densities are allowed to have different sigmas. If FALSE, then a location mixture will be fit. Epsilon: The convergence criterion. Maxit: The maximum number of iterations. Verb: If TRUE, then various updates are printed during each iteration of the algorithm. 56

EM for Example 5 by R (3)  Example 57

Exercises  Write your own programs similar to those examples presented in this talk.  Write programs for those examples mentioned at the reference web pages.  Write programs for the other examples that you know. 59

Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Similar presentations

Presentation on theme: "Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Similar presentations

Presentation on theme: "Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University"— Presentation transcript:

Similar presentations

About project

Feedback