Download presentation
Presentation is loading. Please wait.
Published byEzra Perkins Modified over 9 years ago
1
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign
2
2 Today’s Lecture HMM Applications –Profile HMMs (Classification) –HMMs for Multiple Sequence Alignment (Pattern discovery) –HMMs for Gene Finding (Segmentation) Special issues in HMMs –Local Maximas –Model construction –Weighting training sequences
3
3 HMM Applications Classification (e.g., Profile HMMs) –Build an HMM for each class (profile HMMs) –Classify a sequence using Bayes rule Multiple sequence alignment –Build an HMM based on a set of sequences –Decode each sequence to find a multiple alignment Segmentation (e.g., gene finding) –Use different states to model different regions –Decode a sequence to reveal the region boundaries
4
4 HMMs for Classification p(X|C) is modeled by a profile HMM built specifically for C Assuming example sequences are available for C E.g., Protein families Assign a family to X (Profile HMM will be covered in the next lecture)
5
5 HMMs for Motif Finding Given a set of sequences S={X1, …,Xk} Design an HMM with two kinds of states –Background states: For outside a motif –Motif states: For modeling a motif Train the HMM, e.g., using Baum-Welch (finding the HMM that maximizes the probability of S) The “motif part” of the HMM gives a motif model (e.g., a PWM) The HMM can be used to scan any sequence (including Xi) to figure out where the motif is. We may also decode each sequence Xi to obtain a set of subsequences matched by the motif (e.g., a multiset of k-mers)
6
6 HMMs for Multiple Alignment Given a set of sequences S={X1, …,Xk} Train an HMM, e.g., using Baum-Welch (finding the HMM that maximizes the probability of S) Decode each sequence Xi Assemble the Viterbi paths to form a multiple alignment –The symbols belonging to the same state will be aligned to each other To be covered in the next lecture…
7
7 HMM-based Gene Finding Design two types of states –“Within Gene” States –“Outside Gene” States Use known genes to estimate the HMM Decode a new sequence to reveal which part is a gene Example software: –GENSCAN (Burge 1997) –FGENESH (Solovyev 1997) –HMMgene (Krogh 1997) –GENIE (Kulp 1996) –GENMARK (Borodovsky & McIninch 1993) –VEIL (Henderson, Salzberg, & Fasman 1997)
8
8 VEIL: Viterbi Exon-Intron Locator Exon HMM Model Upstream Start Codon Exon Stop Codon Downstream 3’ Splice Site Intron 5’ Poly-A Site 5’ Splice Site Enter: start codon or intron (3’ Splice Site) Exit: 5’ Splice site or three stop codons (taa, tag, tga) VEIL Architecture (Slide from N. F. Samatova’s lecture)
9
9 GenScan Architecture It is based on Generalized HMM (GHMM) Model both strands at once –Other models: Predict on one strand first, then on the other strand –Avoids prediction of overlapping genes on the two strands (rare) Each state may output a string of symbols (according to some probability distribution). Explicit intron/exon length modeling Special sensors for Cap-site and TATA-box Advanced splice site sensors Fig. 3, Burge and Karlin 1997
10
10 Special Issues Local maxima Optimal model construction Weighting training sequences
11
11 Solutions to the Local Maxima Problem Repeat with different initializations Start with the most reasonable initial model Simulated annealing (slow down the convergence speed)
12
12 Local Maxima: Illustration Global maxima Local maxima Good starting point Bad starting point
13
13 Optimal Model Construction Bayesian model selection: -P(HMM) should prefer simpler models (i.e., more constrained, fewer states, fewer transitions) -P(HMM) could reflect our prior on the parameters
14
14 Sequence Weighting Avoid over-counting similar sequences from the same organisms Typically compute a weight for a sequence based on an evolutionary tree Many ways to incorporate the weights, e.g., –Unequal likelihood –Unequal weight contribution in parameter estimation
15
15 HMMs in Real Applications SAM-T98 Tutorial: –http://www.cse.ucsc.edu/research/compbio/ismb 99.tutorial.html Pfam –http://www.sanger.ac.uk/Software/Pfam/
16
16 What You Should Know How an HMM can be used to classify sequences How an HMM can be used to align sequences and discover motifs How an HMM can be used to segment sequences (e.g., gene finding) Know the problem of local maxima and possible solutions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.