B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Automatic Speech Recognition Z. Fodroczi Pazmany Peter Catholic. Univ.
Advertisements

Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models Chapter 11. CG “islands” The dinucleotide “CG” is rare –C in a “CG” often gets “methylated” and the resulting C then mutates to T.
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Ab initio gene prediction Genome 559, Winter 2011.
HIDDEN MARKOV MODELS Prof. Navneet Goyal Department of Computer Science BITS, Pilani Presentation based on: & on presentation on HMM by Jianfeng Tang Old.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models CBB 231 / COMPSCI 261. An HMM is a following: An HMM is a stochastic machine M=(Q, , P t, P e ) consisting of the following: a finite.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Hidden Markov Models in Bioinformatics
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Gene prediction and HMM Computational Genomics 2005/6 Lecture 9b Slides taken from (and rapidly mixed) Larry Hunter, Tom Madej, William Stafford Noble,
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Gene Finding. Finding Genes Prokaryotes –Genome under 10Mb –>85% of sequence codes for proteins Eukaryotes –Large Genomes (up to 10Gb) –1-3% coding for.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Gene Finding (DNA signals) Genome Sequencing and assembly
CSE182-L10 Gene Finding.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models In BioInformatics
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
THE HIDDEN MARKOV MODEL (HMM)
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
Doug Raiford Lesson 3.  Have a fully sequenced genome  How identify the genes?  What do we know so far? 10/13/20152Gene Prediction.
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
1 Hidden Markov Models (HMMs). 2 Definition Hidden Markov Model is a statistical model where the system being modeled is assumed to be a Markov process.
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
Introducing Hidden Markov Models First – a Markov Model State : sunny cloudy rainy sunny ? A Markov Model is a chain-structured process where future states.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models BMI/CS 576
bacteria and eukaryotes
Hidden Markov Model I.
Eukaryotic Gene Finding
Ab initio gene prediction
Hidden Markov Model LR Rabiner
Hidden Markov Models (HMMs)
CSE 5290: Algorithms for Bioinformatics Fall 2009
The Toy Exon Finder.
Presentation transcript:

B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering Department of Biotechnology

G ENE PREDICTION : M ETHODS Gene Prediction can be based upon: Coding statistics Gene structure Comparison Statistical approach Similarity-based approach

G ENE PREDICTION : M ETHODS Gene Prediction can be based upon: Coding statistics Gene structure Comparison Statistical approach Similarity-based approach

G ENE PREDICTION : C ODING STATISTICS Coding regions of the sequence have different properties than non-coding regions: non random properties of coding regions. CG content Codon bias (CODON USAGE).

M ARKOV M ODEL

A Markov model is a process, which moves from state to state depending (only) on the previous n states. For example, calculating the probability of getting this weather sequence states in one week from march: Sunny, Sunny, Cloudy, Rainy, Rainy, Sunny, Cloudy. If today is Cloudy, it would be more appropriate to be Rainy tomorrow On march it’s more appropriate to start with a Sunny day more than other situations And so on.

Weather tomorrow Sunny cloudy Rainy Sunny Cloudy Rainy SunnyCloudy Rainy Weather today Sunny Cloudy Rainy M ARKOV M ODEL

E XAMPLE : Σ = P (S UNNY, S UNNY, C LOUDY, R AINY | M ODEL ) = Π( SUNNY )* P (S UNNY | S UNNY ) * P (C LOUDY | S UNNY ) *P (R AINY | C LOUDY ) = 0.6 * 0.5 * 0.25 * = Sunny Cloudy Rainy Weather today Sunny Cloudy Rainy SunnyCloudy Rainy Weather tomorrow Sunny cloudy Rainy

H IDDEN M ARKOV M ODELS States are not observable Observations are probabilistic functions of state State transitions are still probabilistic

10 CG I SLANDS AND THE “F AIR B ET C ASINO ” The CG islands problem can be modeled after a problem named “The Fair Bet Casino” The game is to flip coins, which results in only two possible outcomes: H ead or T ail. The F air coin will give H eads and T ails with same probability ½. The B iased coin will give H eads with prob. ¾.

11 T HE “F AIR B ET C ASINO ” ( CONT ’ D ) Thus, we define the probabilities: P(H|F) = P(T|F) = ½ P(H|B) = ¾, P(T|B) = ¼ The crooked dealer chages between Fair and Biased coins with probability 10%

12 HMM FOR F AIR B ET C ASINO ( CONT ’ D ) HMM model for the Fair Bet Casino Problem

HMM P ARAMETERS Σ: set of emission characters. Ex.: Σ = {H, T} for coin tossing Σ = {1, 2, 3, 4, 5, 6} for dice tossing Σ = {A, C, G, T} for DNA sequence Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing Q={Non-coding, Coding, Regulatory} for sequences

HMM P ARAMETERS ( CONT ’ D ) A = (a kl ): a |Q| x |Q| matrix of probability of changing from state k to state l. a FF = 0.9 a FB = 0.1 a BF = 0.1 a BB = 0.9 E = (e k ( b )): a |Q| x |Σ| matrix of probability of emitting symbol b while being in state k. e F (0) = ½ e F (1) = ½ e B (0) = ¼ e B (1) = ¾

HMM Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3

The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E). Compute P(Σ | M). Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. Problem 3: How do we adjust the model parameters Π, A, E to maximize P(Σ |{Π, A, E})?

The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) compute P(Σ | M). for example: P ( | M)

What is ? The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM. Naive computation is very expensive. Given T observations and N states, there are N T possible state sequences. Even small HMMs, e.g. T=10 and N=10, contain 10 billion different paths Solution to this and problem 2 is to use dynamic programming P ROBLEM 1: P ROBABILITY OF AN O BSERVATION S EQUENCE

Problem 1: Given observation sequence Σ=O1O2…OT and model M=(Π, A, E) compute P(Σ | M). Forward algorithm Solution: Forward algorithm Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * * 0.1 * 0.25 = * 0.4 * 0.1 = * 0.2 * 0.65 = Sum=

Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. For example: What are most probable Q1Q2Q3Q4 given the observation Q? The three Basic problems of HMMs

P ROBLEM 2: D ECODING The solution to Problem 1 gives us the sum of all paths through an HMM efficiently. For Problem 2, we want to find the path with the highest probability.

Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * * 0.1 * 0.25 = * 0.4 * 0.1 = * 0.2 * 0.65 = THE LARGEST

H IDDEN M ARKOV M ODEL AND G ENE P REDICTION

How is it connected to Gene prediction? Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3

How is it connected to Gene prediction? AGCTAGCT G T G A G G T T T CC A A A A C C G T A G G A C C T T ExonIntronUTR AGCTAGCT AGCTAGCT

H IDDEN M ARKOV M ODELS (HMM) FOR GENE PREDICTION B S D A T F  Basic probabilistic model of gene structure. SE 5‘ 3‘ I E FEIE Signals B: Begin sequence S: Start translation A: acceptor site (AG) D: Donor site (GT) T: Stop translation F: End sequence Hidden states 3‘: 3‘ UTR EI: Initial Exon SE : Single Exon I : Intron E : Exon FE : Final Exon 5‘: 5‘ UTR

E UKARYOTIC G ENES F EATURES H AND O VER ATG  GT  AG  TAG  TAG GT AG TG TAG ATG

T HANK YOU