Recitation on EM slides taken from: Computational Genomics Recitation #6
All EM questions are in the format: 1.Write the likelihood function. 2.Write the Q function. 3.Derive the update rule.
Estimation problems
What is the unobserved data in this case?
Estimation problems
? ? ?
? ? ?
? ? ? ? ? ? ? ? ?
? ? ?
EM question Let G = (G 1, …, G n ) be n contiguous DNA regions representing genes. For each G i we define the mRNA concentration of the gene as P i, s.t. their sum is equal to 1. P = (P 1, …, P n ) can be interpreted as the normalized expression levels for the regions in G.
EM question Our model assumes that reads are generated by randomly picking a region R from G according to the distribution P, and then copying this region. The copying process is error-prone. This process is repeated until we have a set of m reads R = r 1, …, r m generated according to the model described above.
EM question For each region G j and read r i, we have a probability p ij = P(r j | G i ), the probability of observing r j given that the locus of the read was gene G i. In practice, for each read r j, this probability will be close to zero for all but a few regions.
Likelihood function Write the likelihood of observing the m reads. ?
Q function Write the Q(P | P (t) ) term. ? ?
M-step Write the M-step term using argmax function.
Update rule Infer from c the update step for P. When we want to maximize ∑ i a i log(P i ) based on P i, we achieve the maximum P i =a i /∑ i a i ?