Presentation is loading. Please wait.

Presentation is loading. Please wait.

B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering.

Similar presentations


Presentation on theme: "B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering."— Presentation transcript:

1 B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering Department of Biotechnology

2 G ENE PREDICTION : M ETHODS Gene Prediction can be based upon: Coding statistics Gene structure Comparison Statistical approach Similarity-based approach

3 G ENE PREDICTION : M ETHODS Gene Prediction can be based upon: Coding statistics Gene structure Comparison Statistical approach Similarity-based approach

4 G ENE PREDICTION : C ODING STATISTICS Coding regions of the sequence have different properties than non-coding regions: non random properties of coding regions. CG content Codon bias (CODON USAGE).

5 M ARKOV M ODEL

6 A Markov model is a process, which moves from state to state depending (only) on the previous n states. For example, calculating the probability of getting this weather sequence states in one week from march: Sunny, Sunny, Cloudy, Rainy, Rainy, Sunny, Cloudy. If today is Cloudy, it would be more appropriate to be Rainy tomorrow On march it’s more appropriate to start with a Sunny day more than other situations And so on.

7 Weather tomorrow Sunny cloudy Rainy Sunny Cloudy Rainy SunnyCloudy Rainy 0.25 0.5 0.25 Weather today Sunny Cloudy Rainy 0.375 0.125 0.375 0.25 0.625 M ARKOV M ODEL

8 E XAMPLE : Σ = P (S UNNY, S UNNY, C LOUDY, R AINY | M ODEL ) = Π( SUNNY )* P (S UNNY | S UNNY ) * P (C LOUDY | S UNNY ) *P (R AINY | C LOUDY ) = 0.6 * 0.5 * 0.25 * 0.375 = 0.0281 Sunny Cloudy Rainy Weather today Sunny Cloudy Rainy SunnyCloudy Rainy 0.25 0.5 0.25 0.375 0.125 0.375 0.25 0.625 Weather tomorrow Sunny cloudy Rainy

9 H IDDEN M ARKOV M ODELS States are not observable Observations are probabilistic functions of state State transitions are still probabilistic

10 10 CG I SLANDS AND THE “F AIR B ET C ASINO ” The CG islands problem can be modeled after a problem named “The Fair Bet Casino” The game is to flip coins, which results in only two possible outcomes: H ead or T ail. The F air coin will give H eads and T ails with same probability ½. The B iased coin will give H eads with prob. ¾.

11 11 T HE “F AIR B ET C ASINO ” ( CONT ’ D ) Thus, we define the probabilities: P(H|F) = P(T|F) = ½ P(H|B) = ¾, P(T|B) = ¼ The crooked dealer chages between Fair and Biased coins with probability 10%

12 12 HMM FOR F AIR B ET C ASINO ( CONT ’ D ) HMM model for the Fair Bet Casino Problem

13 HMM P ARAMETERS Σ: set of emission characters. Ex.: Σ = {H, T} for coin tossing Σ = {1, 2, 3, 4, 5, 6} for dice tossing Σ = {A, C, G, T} for DNA sequence Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing Q={Non-coding, Coding, Regulatory} for sequences

14 HMM P ARAMETERS ( CONT ’ D ) A = (a kl ): a |Q| x |Q| matrix of probability of changing from state k to state l. a FF = 0.9 a FB = 0.1 a BF = 0.1 a BB = 0.9 E = (e k ( b )): a |Q| x |Σ| matrix of probability of emitting symbol b while being in state k. e F (0) = ½ e F (1) = ½ e B (0) = ¼ e B (1) = ¾

15 HMM Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3

16 The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E). Compute P(Σ | M). Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. Problem 3: How do we adjust the model parameters Π, A, E to maximize P(Σ |{Π, A, E})?

17 The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) compute P(Σ | M). for example: P ( | M)

18 What is ? The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM. Naive computation is very expensive. Given T observations and N states, there are N T possible state sequences. Even small HMMs, e.g. T=10 and N=10, contain 10 billion different paths Solution to this and problem 2 is to use dynamic programming P ROBLEM 1: P ROBABILITY OF AN O BSERVATION S EQUENCE

19 Problem 1: Given observation sequence Σ=O1O2…OT and model M=(Π, A, E) compute P(Σ | M). Forward algorithm Solution: Forward algorithm Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 0.6 0.3 0.1 Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * 0.25 0.1 0.65 0.15 0.03 0.065 * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * 0.65 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 Sum= 0.0134 0.0134

20 Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. For example: What are most probable Q1Q2Q3Q4 given the observation Q? The three Basic problems of HMMs

21 P ROBLEM 2: D ECODING The solution to Problem 1 gives us the sum of all paths through an HMM efficiently. For Problem 2, we want to find the path with the highest probability.

22 Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 0.6 0.3 0.1 Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * 0.25 0.1 0.65 0.15 0.03 0.065 * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * 0.65 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 THE LARGEST 0.00845

23 H IDDEN M ARKOV M ODEL AND G ENE P REDICTION

24 How is it connected to Gene prediction? Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3

25 How is it connected to Gene prediction? AGCTAGCT G T G A G G T T T CC A A A A C C G T A G G A C C T T ExonIntronUTR AGCTAGCT AGCTAGCT

26 H IDDEN M ARKOV M ODELS (HMM) FOR GENE PREDICTION B S D A T F  Basic probabilistic model of gene structure. SE 5‘ 3‘ I E FEIE Signals B: Begin sequence S: Start translation A: acceptor site (AG) D: Donor site (GT) T: Stop translation F: End sequence Hidden states 3‘: 3‘ UTR EI: Initial Exon SE : Single Exon I : Intron E : Exon FE : Final Exon 5‘: 5‘ UTR

27 E UKARYOTIC G ENES F EATURES H AND O VER ATG  GT  AG  TAG  TAG GT AG TG TAG ATG

28 T HANK YOU


Download ppt "B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering."

Similar presentations


Ads by Google