Download presentation
Presentation is loading. Please wait.
Published byLydia Logan Modified over 9 years ago
1
B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering Department of Biotechnology
2
G ENE PREDICTION : M ETHODS Gene Prediction can be based upon: Coding statistics Gene structure Comparison Statistical approach Similarity-based approach
3
G ENE PREDICTION : M ETHODS Gene Prediction can be based upon: Coding statistics Gene structure Comparison Statistical approach Similarity-based approach
4
G ENE PREDICTION : C ODING STATISTICS Coding regions of the sequence have different properties than non-coding regions: non random properties of coding regions. CG content Codon bias (CODON USAGE).
5
M ARKOV M ODEL
6
A Markov model is a process, which moves from state to state depending (only) on the previous n states. For example, calculating the probability of getting this weather sequence states in one week from march: Sunny, Sunny, Cloudy, Rainy, Rainy, Sunny, Cloudy. If today is Cloudy, it would be more appropriate to be Rainy tomorrow On march it’s more appropriate to start with a Sunny day more than other situations And so on.
7
Weather tomorrow Sunny cloudy Rainy Sunny Cloudy Rainy SunnyCloudy Rainy 0.25 0.5 0.25 Weather today Sunny Cloudy Rainy 0.375 0.125 0.375 0.25 0.625 M ARKOV M ODEL
8
E XAMPLE : Σ = P (S UNNY, S UNNY, C LOUDY, R AINY | M ODEL ) = Π( SUNNY )* P (S UNNY | S UNNY ) * P (C LOUDY | S UNNY ) *P (R AINY | C LOUDY ) = 0.6 * 0.5 * 0.25 * 0.375 = 0.0281 Sunny Cloudy Rainy Weather today Sunny Cloudy Rainy SunnyCloudy Rainy 0.25 0.5 0.25 0.375 0.125 0.375 0.25 0.625 Weather tomorrow Sunny cloudy Rainy
9
H IDDEN M ARKOV M ODELS States are not observable Observations are probabilistic functions of state State transitions are still probabilistic
10
10 CG I SLANDS AND THE “F AIR B ET C ASINO ” The CG islands problem can be modeled after a problem named “The Fair Bet Casino” The game is to flip coins, which results in only two possible outcomes: H ead or T ail. The F air coin will give H eads and T ails with same probability ½. The B iased coin will give H eads with prob. ¾.
11
11 T HE “F AIR B ET C ASINO ” ( CONT ’ D ) Thus, we define the probabilities: P(H|F) = P(T|F) = ½ P(H|B) = ¾, P(T|B) = ¼ The crooked dealer chages between Fair and Biased coins with probability 10%
12
12 HMM FOR F AIR B ET C ASINO ( CONT ’ D ) HMM model for the Fair Bet Casino Problem
13
HMM P ARAMETERS Σ: set of emission characters. Ex.: Σ = {H, T} for coin tossing Σ = {1, 2, 3, 4, 5, 6} for dice tossing Σ = {A, C, G, T} for DNA sequence Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing Q={Non-coding, Coding, Regulatory} for sequences
14
HMM P ARAMETERS ( CONT ’ D ) A = (a kl ): a |Q| x |Q| matrix of probability of changing from state k to state l. a FF = 0.9 a FB = 0.1 a BF = 0.1 a BB = 0.9 E = (e k ( b )): a |Q| x |Σ| matrix of probability of emitting symbol b while being in state k. e F (0) = ½ e F (1) = ½ e B (0) = ¼ e B (1) = ¾
15
HMM Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3
16
The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E). Compute P(Σ | M). Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. Problem 3: How do we adjust the model parameters Π, A, E to maximize P(Σ |{Π, A, E})?
17
The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) compute P(Σ | M). for example: P ( | M)
18
What is ? The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM. Naive computation is very expensive. Given T observations and N states, there are N T possible state sequences. Even small HMMs, e.g. T=10 and N=10, contain 10 billion different paths Solution to this and problem 2 is to use dynamic programming P ROBLEM 1: P ROBABILITY OF AN O BSERVATION S EQUENCE
19
Problem 1: Given observation sequence Σ=O1O2…OT and model M=(Π, A, E) compute P(Σ | M). Forward algorithm Solution: Forward algorithm Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 0.6 0.3 0.1 Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * 0.25 0.1 0.65 0.15 0.03 0.065 * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * 0.65 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 Sum= 0.0134 0.0134
20
Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. For example: What are most probable Q1Q2Q3Q4 given the observation Q? The three Basic problems of HMMs
21
P ROBLEM 2: D ECODING The solution to Problem 1 gives us the sum of all paths through an HMM efficiently. For Problem 2, we want to find the path with the highest probability.
22
Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 0.6 0.3 0.1 Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * 0.25 0.1 0.65 0.15 0.03 0.065 * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * 0.65 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 THE LARGEST 0.00845
23
H IDDEN M ARKOV M ODEL AND G ENE P REDICTION
24
How is it connected to Gene prediction? Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3
25
How is it connected to Gene prediction? AGCTAGCT G T G A G G T T T CC A A A A C C G T A G G A C C T T ExonIntronUTR AGCTAGCT AGCTAGCT
26
H IDDEN M ARKOV M ODELS (HMM) FOR GENE PREDICTION B S D A T F Basic probabilistic model of gene structure. SE 5‘ 3‘ I E FEIE Signals B: Begin sequence S: Start translation A: acceptor site (AG) D: Donor site (GT) T: Stop translation F: End sequence Hidden states 3‘: 3‘ UTR EI: Initial Exon SE : Single Exon I : Intron E : Exon FE : Final Exon 5‘: 5‘ UTR
27
E UKARYOTIC G ENES F EATURES H AND O VER ATG GT AG TAG TAG GT AG TG TAG ATG
28
T HANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.