Download presentation
Presentation is loading. Please wait.
Published byEllen Banks Modified over 9 years ago
1
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1
2
Part 4 The EM Algorithm 2
3
EM Algorithm (1) Dempster, Little, Rubin (1977, JRSSB) [The proof has some error.] Wu (1983, AS) [Correct the proof.] Books and reference for the EM Algorithm: McLachland and Krishnan (1997), The EM Algorithm and Extensions, Wiley, New York. http://www2.isye.gatech.edu/~brani/isye bayes/bank/handout12.pdf 3
4
EM Algorithm (2) The EM algorithm is used to maximize complex likelihoods for incomplete data problems. Let y be the observed data from a pdf of, where is a vector of parameters. Let x = [y, z] be a vector of complete data with the augmented data z. 4
5
EM Algorithm (3) The incomplete data vector y comes from the incomplete sample space y. There is a 1-1 correspondence between the complete sample space and the incomplete sample space. Let be some initial value for. At the k-th step, the EM algorithm performs the following two steps: 5
6
EM Algorithm (4) 1. E-step: Projecting an appropriate functional containing the complete data on the space of the incomplete data. Calculate 2. M-step: Maximizing the functional evaluated in the E-step. Choose the value that maximizes 6
7
EM Algorithm (5) The E and M steps are iterated until the difference of becomes small. 7
8
8 Example 1 in Genetics (1) Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab F ( Female) 1- r ’ r ’ (female recombination fraction) M (Male) 1-r r (male recombination fraction) A Bb a B A b a a B b A A B b a 8
9
9 Example 1 in Genetics (2) r and r ’ are the recombination rates for male and female Suppose the parental origin of these heterozygote is from the mating of. The problem is to estimate r and r ’ from the offspring of selfed heterozygotes. Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79 – 92. http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/b ank/handout12.pdf http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/b ank/handout12.pdf 9
10
10 Example 1 in Genetics (3) MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4 10
11
11 Example 1 in Genetics (4) Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b* : the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations. 11
12
12 Example 1 in Genetics (5) 12
13
13 Example 1 in Genetics (6) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 13
14
EM Algorithm for Example 1 (1) Incomplete data Complete data 14
15
EM Algorithm for Example 1 (2) Define Then 15
16
EM Algorithm for Example 1 (2) 16
17
EM Algorithm for Example 1 (3) 17
18
Lemma (1) Proof: 18
19
Lemma (2) Not that the above equality holds if and only if almost everywhere. Q.E.D. 19
20
Expectation Step Let = old iteration = new iteration 20
21
Maximization Step 21
22
EM Algorithm for Example 1 (4) 1. 2. (E step) 3. (M step) 4. Else, and go to step 2. 22
23
Question 23
24
Theorem (1) Proof: i.e., 24
25
Theorem (2) Therefore, the MLE-EM will be monotonically convergent under regular conditions. 25
26
MLE-EM by R (1) >fix(EM) 26
27
MLE-EM by R (2) 27
28
MLE-EM by C/C++ 28
29
Result: (Initial Value=0.5) 29
30
Result: (Initial Value=0.25) 30
31
Result: (Initial Value=0.9) 31
32
Result: (Initial Value=1) overfloat 32
33
Part 5 High Dimension Cases 33
34
High Dimension Optimization (1) 1. and local optima: parallel chord, Newton-Raphson, Fisher Scoring, Bisection … 2. is non-differentiable and local optima: Grid search Linear search Golden section search MLE+search … 34
35
High Dimension Optimization (2) 3. Global optima: Grid search Simulation annealing (MCMC, MH algorithm) … 35
36
High Dimension Optimization in R (1) optim(initial value, function, method, lower, upper) Methods: "BFGS" is a quasi-Newton method. This uses function values and gradients to build up a picture of the surface to be optimized. "CG" is a conjugate gradients method based on that by Fletcher and Reeves "L-BFGS-B" is that of Byrd et. al. (1995) which allows box constraints, that is each variable can be given a lower/ upper bound. The initial value must satisfy the constraints. 36
37
High Dimension Optimization in R (2) “ SANN ” belongs to the class of stochastic global optimization methods. It uses only function values but is relatively slow. It will also work for non-differentiable functions. lower, upper: Bounds on the variables for the "L-BFGS-B" method. See more details in help document. 37
38
Demo of optim( ) in R fr <- function(x) ## Rosenbrock Banana function { x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } optim(c(-1.2,1), fr, method = "BFGS") The best set of parameters found. The value of fn corresponding to par. 0 indicates successful convergence 38
39
optim() vs. optimize() in one dimension
40
Part 6 Exercises of MLE-EMs 40
41
EM for Example 3 (1) Example 3 in genetics: The observed data are (n O, n A, n B, n AB ) = (176, 182, 60, 17) ~ Multinomial(r 2, p 2 +2pr, q 2 +2qr, 2pq), where p, q, and r fall in [0,1] such that p+q+r = 1. Find the likelihood function and score equations for p, q, and r. 41 Category (Cell)Cell ProbabilityObserved Frequency Or2r2 n O =176 Ap 2 +2prn A =182 Bq 2 +2qrn B =60 AB2pqn AB =17 41
42
EM for Example 3 (2) Incomplete data let 42
43
EM for Example 3 (3) Complete data Category (Cell)Cell ProbabilityObserved Frequency OOr2r2 nOnO AAp2p2 n AA AO2qrn AO BBq2q2 n BB BO2qrn BO AB2pqn AB 43
44
EM for Example 3 (4) where , 44
45
EM for Example 3 (5) Similarly, one can obtain the current conditional expectations of Execution of the M-step gives 45
46
EM for Example 3 by R (1) 46
47
EM for Example 3 by R (2) 47
48
EM for Example 4 (1) In the positron emission tomography (PET): The observed data are n*(d) ~Poisson(λ*(d)), d = 1, 2, …, D, and The values of p(b,d) are known and the unknown parameters are λ(b), b = 1, 2, …, B. Find the likelihood function and score equations for λ(b), b = 1, 2, …, B. http://en.wikipedia.org/wiki/Positron_emi ssion_tomography http://en.wikipedia.org/wiki/Positron_emi ssion_tomography. 48
49
EM for Example 4 (2) Incomplete Model: Complete Model: Complete log-likelihood function: 49
50
E-step M-step EM for Example 4 (3) 50
51
EM for Example 5 (1) In the normal mixture (http://en.wikipedia.org/wiki/Mixture_model): The observed data x i, i=1, 2, …, n, are random samples from the following probability density function:http://en.wikipedia.org/wiki/Mixture_model 51
52
EM for Example 5 (2) and where 52
53
EM for Example 5 (3) 53
54
EM for Example 5 (4) Maximize Q with subject to the constraint. Apply the Lagrange method on Q, it turns to maximize The estimators of are 54
55
EM for Example 5 by R (1) normalmixEM(x, lambda = NULL, mu = NULL, sigma = NULL, k = 2, arbmean = TRUE, arbvar = TRUE, epsilon = 1e-08, maxit = 10000, verb = FALSE) Argument x: A vector of length n consisting of the data lambda: Initial value of mixing proportions. mu: A k-vector of initial values for the mean parameters. Sigma: A k-vector of initial values for the standard deviation parameters. 55
56
EM for Example 5 by R (2) k: Number of components. arbmean: If TRUE, then the component densities are allowed to have different mus. If FALSE, then a scale mixture will be fit. Arbvar: If TRUE, then the component densities are allowed to have different sigmas. If FALSE, then a location mixture will be fit. Epsilon: The convergence criterion. Maxit: The maximum number of iterations. Verb: If TRUE, then various updates are printed during each iteration of the algorithm. 56
57
EM for Example 5 by R (3) Example 57
58
EM for Example 5 by R (4) 58
59
Exercises Write your own programs similar to those examples presented in this talk. Write programs for those examples mentioned at the reference web pages. Write programs for the other examples that you know. 59
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.