CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV)

Slides:



Advertisements
Similar presentations
GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
Ab initio gene prediction Genome 559, Winter 2011.
Hidden Markov Models.
1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008.
Hidden Markov Models Modified from:
Ka-Lok Ng Dept. of Bioinformatics Asia University
1 Introduction to Bioinformatics 2 Mini Exam 3 3 Mini Exam Take a pencil and a piece of paper Please, not too close to your neighbour There a three.
Profiles for Sequences
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.
CISC667, F05, Lec18, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation.
GTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAA exon intron intergene Find Gene Structures in DNA Intergene State First Exon State Intron State.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Lyle Ungar, University of Pennsylvania Hidden Markov Models.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Comparative ab initio prediction of gene structures using pair HMMs
An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Eukaryotic Gene Finding
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods that help solve biological problems > Capable of incorporating.
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Eukaryotic Gene Finding
Hidden Markov Models In BioInformatics
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
A markovian approach for the analysis of the gene structure C. MelodeLima 1, L. Guéguen 1, C. Gautier 1 and D. Piau 2 1 Biométrie et Biologie Evolutive.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Mark D. Adams Dept. of Genetics 9/10/04
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
Introduction to ab initio and evidence-based gene finding Wilson Leung08/2015.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences Authors: Michael M. Yin and Jason T. L. Wang Sources: Information.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
CISC667, S07, Lec4, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Whole genome sequencing Mapping & Assembly.
Introducing Hidden Markov Models First – a Markov Model State : sunny cloudy rainy sunny ? A Markov Model is a chain-structured process where future states.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
CISC667, S07, Lec25, Liao1 CISC 467/667 Intro to Bioinformatics (Spring 2007) Review Session.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Hidden Markov Models BMI/CS 576
Free for Academic Use. Jianlin Cheng.
bacteria and eukaryotes
Genome Annotation (protein coding genes)
What is a Hidden Markov Model?
Bioinformatics lectures at Rice University
Hidden Markov Model I.
Eukaryotic Gene Finding
Combining HMMs with SVMs
Ab initio gene prediction
Pair Hidden Markov Model
Hidden Markov Models in Bioinformatics min
- Viterbi training - Baum-Welch algorithm - Maximum Likelihood
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
CISC 841 Bioinformatics (Fall 2007) Hybrid models
CISC 667 Intro to Bioinformatics (Spring 2007) Review session for Mid-Term CISC667, S07, Lec14, Liao.
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
Profile HMMs GeneScan TMMOD
4. HMMs for gene finding HMM Ability to model grammar
Bioinformatics 김유환, 문현구, 정태진, 정승우.
Modeling of Spliceosome
Mod. Reg. Data set Correct topology location Sens- itivity Speci- ficity TMMOD 1 (a) (b) (c) S (78.3%) 51 (61.4%) 64 (77.1%) 67 (80.7%) 52 (62.7%)
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)
Presentation transcript:

CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (IV) Profile HMMs GeneScan TMMOD CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

GENSCAN (generalized HMMs) Chris Burge, PhD Thesis ’97, Stanford http://genes.mit.edu/GENSCAN.html Four components A vector π of initial probabilities A matrix T of state transition probabilities A set of length distribution f A set of sequence generating models P Generalized HMMs: at each state, emission is not symbols (or residues), rather, it is a fragment of sequence. Modified viterbi algorithm CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

Initial state probabilities As frequency for each functional unit to occur in actual genomic data. E.g., as ~ 80% portion are non-coding intergenic regions, the initial probability for state N is 0.80 Transition probabilities State length distributions CISC667, F05, Lec13, Liao

Training data 2.5 Mb human genomic sequences 380 genes, 142 single-exon genes, 1492 exons and 1254 introns 1619 cDNAs CISC667, F05, Lec13, Liao

Open areas for research Model building Integration of domain knowledge, such as structural information, into profile HMMs Meta learning? Biological mechanism DNA replication Hybrid models Generalized HMM … CISC667, F05, Lec13, Liao

TMMOD: An improved hidden Markov model for predicting transmembrane topology CISC667, F05, Lec13, Liao

TMHMM by Krogh, A. et al JMB 305(2001)567-580 Non-cytoplasmic side Cytoplasmic side membrane Cap cyt Helix core Cap Non-cyt Long loop Non-cyt globular globular Loop cyt Long loop Non-cyt globular Cap cyt Helix core Cap Non-cyt Accuracy of prediction for topology: 78% CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao

CISC667, F05, Lec13, Liao Mod. Reg. Data set Correct topology location Sens- itivity Speci- ficity TMMOD 1 (a) (b) (c) S-83 65 (78.3%) 51 (61.4%) 64 (77.1%) 67 (80.7%) 52 (62.7%) 97.4% 71.3% 97.1% TMMOD 2 61 (73.5%) 54 (65.1%) 66 (79.5%) 99.4% 93.8% 99.7% TMMOD 3 70 (84.3%) 74 (89.2%) 71 (85.5%) 98.2% 95.3% 99.1% TMHMM 69 (83.1%) 96.2% PHDtm (85.5%) (88.0%) 98.8% 95.2% S-160 117 (73.1%) 92 (57.5%) 128 (80.0%) 103 (64.4%) 126 (78.8%) 77.4% 96.1% 97.0% 80.8% 96.7% 120 (75.0%) 97 (60.6%) 118 (73.8%) 132 (82.5%) 121 (75.6%) 135 (84.4%) 98.4% 97.7% 97.2% 95.6% 110 (68.8%) 135 (84.4%) 133 (83.1%) 124 (77.5%) 143 (89.4%) 97.8% 94.5% 98.3% 97.6% 98.1% 123 (76.9%) 134 (83.8%) CISC667, F05, Lec13, Liao