Lyle Ungar, University of Pennsylvania Hidden Markov Models
Lyle H Ungar, University of Pennsylvania 2 Markov Model Sequence of states E..g., exon, intron, … Sequence of observations E.g., AATCGGCGT Called “emissions” Probability of transition The Markov matrix M ij = p(S j | S i ) Probability of emission P(O k |S j )
Lyle H Ungar, University of Pennsylvania 3 Markov Matrix properties Columns of M sum to one You must transition somewhere Multiplying by M gives probilites of the state of the next item in the sequence P(Sj) = Mij P(Si) 0.67 =
Lyle H Ungar, University of Pennsylvania 4 Prokaryotic HMM
Lyle H Ungar, University of Pennsylvania 5 Eukarotic HMM
Lyle H Ungar, University of Pennsylvania 6 Hidden Markov Model Can’t observe the states Need to estimate using HMM using an EM algorithm “Baum-Welsh” or “forward-backward” Given an HMM, for a new sequence, find the most likely states Done using dynamic programming “Viterbi algorithm”
Lyle H Ungar, University of Pennsylvania 7 More Realistic HMMs Frame Shifts need more states Generalized HMMs (GMMs) Distribution of exon lengths is not geometric Example gene finders Genscan
Lyle H Ungar, University of Pennsylvania 8 How well do they work? Define criteria for working well Base level, exon level or entire gene? Sn: Sensitivity = fraction of correct exons over actual exons Sp: Specificity = fraction of correct exons over predicted exons
Lyle H Ungar, University of Pennsylvania 9 HMM accuracies sis/GeneIdentification/Evaluation.html sis/GeneIdentification/Evaluation.html
Lyle H Ungar, University of Pennsylvania 10 Combined methods HMM plus sequence similarity Twinscan
Lyle H Ungar, University of Pennsylvania 11 Align using an HMM ACCGGA__TTTG __CGGACGTAT_ DDMMMMIIMMMD ACCGGA__TTTG __CGGACGTAT_ DDMMMMIIMMMD
Lyle H Ungar, University of Pennsylvania 12 Combined HMM