Download presentation
Presentation is loading. Please wait.
Published byLouisa Brooks Modified over 9 years ago
1
Hidden Markov Modeling, Multiple Alignments and Structure Bioinformatic Modeling Techniques Student: Patricia Pearl
2
The basic notion of a hidden Markov model was covered during the class lectures and in our midterm. There are more issues about its history development and future that we’ll discuss tonight.
3
There was a time when scientists started to think about using hidden Markov models for multiple protein alignments. When was that? Which professional field was using it already?
4
This is the bibliographic reference for the article that protein scientists used when they got started. Rabiner, L. R. “A tutorial on hidden Markov models and selected application in speech recognition.” Proceedings of the IEEE, 77 (2), 257-286. 1989. This work was sophisticated and a group of scientists at University of California at Santa Cruz could make an analogy between computer speech recognition and protein multiple alignments.
5
How did they make the analogy between speech recognition and multiple protein and DNA alignments? Speech Recognition Multiple Alignments Alphabet phonemes amino acids Observation words or strings primary sequence of phonemes Good – assigns sounds that sequences in the high probability are real words set
6
The paper they published is: Krogh, A., Brown, M., Mian, I.S., Sjölander, K., and Haussler, D. “Hidden Markov Models in Computational Biology: Applications to Protein Modeling.” Journal of Molecular Biology, 1994, 235:1501-1531. Sean Eddy was a student at UCSC then. In an article of his, (1996) he describes the paper referenced above as: “The paper that introduced the use of HMM methods for protein and DNA sequence profiles. “
7
Then, the software was developed by two collections of scientists and grad students, separately. There are many researchers in the subject that are not at these labs. University of California at Santa Cruz and University of Washington, St Louis, Missouri, by UCSC’s former student, Sean Eddy and his research group. Two suites of software have been developed. Their differences are non-trivial. SAM at UCSC Sequence Alignment and Modeling System. HMMER at U of W. Both suites can be downloaded. SAM needs UNIX. HMMER can use many systems.
8
As has been emphasized in lecture, the advantage of the HMM approach is that it does not guess aabout gap penalties, nor about amino acids nor states. It bases those values on actual data, Bayesian probabilities based in facts. SAM at UCSC Sequence Alignment and Modeling System. Their software is based on HMM’s. Also use a mathematical approach called Dirichlet mixtures to improve detection of weak homologies and to derive hidden Markov models for protein families.
9
HMMER at University of Washington Sean Eddy’s Lab Home Page http://www.genetics.wustl.edu/eddy/publications/ This page and related pages have many articles that are available to download. URL for User’s Guide http://www.psc.edu/general/software/packages/hmmer/manual/main. html If we had HMMER installed at BRANDEIS for us, we could all use it with the help of this manual.
10
HMMER One of the approaches that Sean Eddy has taken to improve HMMER is to use an approach from computational physical chemistry and x-ray diffraction protein crystallography called simulated annealing. The probability values of the fundamental recursive HMM algorithm are varied by an exponential factor taken from the Boltzman formula for physical entropy. S = k b ln Ω The Boltzman constant, k b, is multiplied by t, for temperature. It is started at t = high temp and decreased. The “kt” is used as an exponent P^(1/kt). Eddy reports that it improves accuracy. (Eddy, S., 1995)
11
Many people are developing the HMM approach to use it on RNA sequences. It is meaningful to briefly describe a recent paper that makes extensive use of primarily hand done RNA alignments, using both primary sequence and secondary RNA structure. It produces evidence toward resolving a problem in systematics biology or evolutionary biology. With HMMER, or any similar software, for RNA alignments, much of this work may be much easier and have measurable probabilistic statistics in the future.
12
“However, accurate alignment is only possible for proteins of known structure – at least for an identifiable core of residues that comprises the secondary structure elements and active site of the molecule.” S. Eddy(1995) quoting Chothia and Lesk(1986)
13
Common ancestor OR Anatomical Evidence And more rRNA Multiple alignments w/out secondary structure Crocodile Bird Mammal
14
10 20 30 40 ----|----|----|----|----|----|----|----| Seq1 A-CC-----GC--------GA--CUUG--GA-CC-CG--G Seq2 A-CC-----GU--------GA--CUUG--GA-CC-CG--G Seq3 AACCCCGGUGUAGGGGGAAGAACCUUGAUGAACCUCGAUG Seq4 AACCCCGGUGCAGGGGGAAGAACCUUCAUGAACCUCGAUG Figure 1. The problem of aligning short and long sequences. Sequences 1 and 2 are like the reptilian and bird ribosomal 18s RNA. Sequences 3 and 4 are like mammals. Reference: Xiam X., Xie, Z., Kjer, K.M. “18S ribosomal RNA and tetrapod phylogeny.” Systematic Biology. Washington: Jun 2003. Vol 52, Iss.3; pg 283.
15
Phylogenetic tree From: Xiam et al., 2003
16
They produced several phylogenetic trees, using different methods, with the careful manual alignments that took secondary structure into account. In all, the birds are closer to the crocodiles than to the mammals. “Our research indicates that the previous discrepancy of phylogenetic results between the 18S rRNA gene and other genes is caused mainly by: 1.) misalignment of sequences 2.) the inappropriate use of the frequency parameters 3.) poor sequence quality. When the sequences are aligned with the aide of the secondary structure of the 18S rRNA molecule and when the frequency parameters are estimated either from all sites or from the variable domains where substitutions have occurred, the 18S rRNA sequences no longer support the grouping of the avian species with the mammalian species.” Xia, X., et al., 2003
17
If there were more time, this presentation would also Include discussions of Psi Blast and of SuperFam. Psi Blast is a BLAST software at NCBI that uses HMM’s and can use multiple alignments. a tutorialhttp://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.h tml the sitehttp://www.ncbi.nlm.nih.gov/BLAST/
18
SuperFam is a relatively new website. It uses the HMM approach, 59 genomes, and all the solved structures, from those genomes, that are publicly available, as well. The head scientist of SuperFam, Prof. Cyrus Chothia, also supervised a web site called SCOP, or Structural Classification of Proteins. You might find it interesting, that all of the protein structures that are “solved” are actually organized and classified.
19
Bibliography Eddy, S.R. “Multiple alignment using hidden Markov models.” Proc. Int. Conf. Intell. Syst. Mol Biol. 1995;3:114-120. Eddy, S.R. “Hidden Markov Models.” Curr Opin Struct Biol. 1996 Jun;6(3):361-5. Review. Eddy, S.R., “Profile hidden Markov models.” Bioinformatics, 1998; 14(9): 755-763. Review. Gough, J., and Chothia, C., “SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments.” Nucleic Acids Research, 2002, Vol 30:1. Krogh, A., Brown, M., Mian, I.S., Sjolander, Haussler, D. “Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501-1531, February 1994.
20
Rabiner, L. R. “A tutorial on hidden Markov models and selected application in speech recognition.” Proceedings of the IEEE, 77 (2), 257-286. 1989. Xia, X., Xie, Z., Kjer, K.M. “18S ribosomal RNA and tetrapod phylogeny.” Systematic Biology. Washington: Jun 2003. Jun 2003. Vol. 52, Iss. 3; pg 283.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.