B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering.

B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering Department of Biotechnology

G ENE PREDICTION : M ETHODS Gene Prediction can be based upon: Coding statistics Gene structure Comparison Statistical approach Similarity-based approach

G ENE PREDICTION : C ODING STATISTICS Coding regions of the sequence have different properties than non-coding regions: non random properties of coding regions. CG content Codon bias (CODON USAGE).

M ARKOV M ODEL

A Markov model is a process, which moves from state to state depending (only) on the previous n states. For example, calculating the probability of getting this weather sequence states in one week from march: Sunny, Sunny, Cloudy, Rainy, Rainy, Sunny, Cloudy. If today is Cloudy, it would be more appropriate to be Rainy tomorrow On march it’s more appropriate to start with a Sunny day more than other situations And so on.

Weather tomorrow Sunny cloudy Rainy Sunny Cloudy Rainy SunnyCloudy Rainy 0.25 0.5 0.25 Weather today Sunny Cloudy Rainy 0.375 0.125 0.375 0.25 0.625 M ARKOV M ODEL

E XAMPLE : Σ = P (S UNNY, S UNNY, C LOUDY, R AINY | M ODEL ) = Π( SUNNY )* P (S UNNY | S UNNY ) * P (C LOUDY | S UNNY ) *P (R AINY | C LOUDY ) = 0.6 * 0.5 * 0.25 * 0.375 = 0.0281 Sunny Cloudy Rainy Weather today Sunny Cloudy Rainy SunnyCloudy Rainy 0.25 0.5 0.25 0.375 0.125 0.375 0.25 0.625 Weather tomorrow Sunny cloudy Rainy

H IDDEN M ARKOV M ODELS States are not observable Observations are probabilistic functions of state State transitions are still probabilistic

10 CG I SLANDS AND THE “F AIR B ET C ASINO ” The CG islands problem can be modeled after a problem named “The Fair Bet Casino” The game is to flip coins, which results in only two possible outcomes: H ead or T ail. The F air coin will give H eads and T ails with same probability ½. The B iased coin will give H eads with prob. ¾.

11 T HE “F AIR B ET C ASINO ” ( CONT ’ D ) Thus, we define the probabilities: P(H|F) = P(T|F) = ½ P(H|B) = ¾, P(T|B) = ¼ The crooked dealer chages between Fair and Biased coins with probability 10%

12 HMM FOR F AIR B ET C ASINO ( CONT ’ D ) HMM model for the Fair Bet Casino Problem

HMM P ARAMETERS Σ: set of emission characters. Ex.: Σ = {H, T} for coin tossing Σ = {1, 2, 3, 4, 5, 6} for dice tossing Σ = {A, C, G, T} for DNA sequence Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing Q={Non-coding, Coding, Regulatory} for sequences

HMM P ARAMETERS ( CONT ’ D ) A = (a kl ): a |Q| x |Q| matrix of probability of changing from state k to state l. a FF = 0.9 a FB = 0.1 a BF = 0.1 a BB = 0.9 E = (e k ( b )): a |Q| x |Σ| matrix of probability of emitting symbol b while being in state k. e F (0) = ½ e F (1) = ½ e B (0) = ¼ e B (1) = ¾

HMM Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3

The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E). Compute P(Σ | M). Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. Problem 3: How do we adjust the model parameters Π, A, E to maximize P(Σ |{Π, A, E})?

The three Basic problems of HMMs Problem 1: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) compute P(Σ | M). for example: P ( | M)

What is ? The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM. Naive computation is very expensive. Given T observations and N states, there are N T possible state sequences. Even small HMMs, e.g. T=10 and N=10, contain 10 billion different paths Solution to this and problem 2 is to use dynamic programming P ROBLEM 1: P ROBABILITY OF AN O BSERVATION S EQUENCE

Problem 1: Given observation sequence Σ=O1O2…OT and model M=(Π, A, E) compute P(Σ | M). Forward algorithm Solution: Forward algorithm Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 0.6 0.3 0.1 Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * 0.25 0.1 0.65 0.15 0.03 0.065 * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * 0.65 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 Sum= 0.0134 0.0134

Problem 2: Given observation sequence Σ=O 1 O 2 …O T and model M=(Π, A, E) how do we choose a corresponding state sequence Q=q 1 q 2 …q T,which best “explains” the observation. For example: What are most probable Q1Q2Q3Q4 given the observation Q? The three Basic problems of HMMs

P ROBLEM 2: D ECODING The solution to Problem 1 gives us the sum of all paths through an HMM efficiently. For Problem 2, we want to find the path with the highest probability.

Example: P( | M). Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q3 Q2 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 0.6 0.3 0.1 Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue * * * 0.25 0.1 0.65 0.15 0.03 0.065 * 0.1 * 0.25 * 0.4 * 0.1 * 0.2 * 0.65 0.15 * 0.1 * 0.25 = 0.00375 0.03 * 0.4 * 0.1 = 0.0012 0.065 * 0.2 * 0.65 = 0.00845 THE LARGEST 0.00845

H IDDEN M ARKOV M ODEL AND G ENE P REDICTION

How is it connected to Gene prediction? Yellow Red Green Blue Yellow Red Green Blue Yellow Red Green Blue Q1Q3Q2 Q1 Q2 Q3 Q1 Q2 Q3 i th turn i+1 turn Q1 Q2 Q3

How is it connected to Gene prediction? AGCTAGCT G T G A G G T T T CC A A A A C C G T A G G A C C T T ExonIntronUTR AGCTAGCT AGCTAGCT

H IDDEN M ARKOV M ODELS (HMM) FOR GENE PREDICTION B S D A T F  Basic probabilistic model of gene structure. SE 5‘ 3‘ I E FEIE Signals B: Begin sequence S: Start translation A: acceptor site (AG) D: Donor site (GT) T: Stop translation F: End sequence Hidden states 3‘: 3‘ UTR EI: Initial Exon SE : Single Exon I : Intron E : Exon FE : Final Exon 5‘: 5‘ UTR

E UKARYOTIC G ENES F EATURES H AND O VER ATG  GT  AG  TAG  TAG GT AG TG TAG ATG

T HANK YOU

B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering.

Similar presentations

Presentation on theme: "B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering.

Similar presentations

Presentation on theme: "B IOINFORMATICS Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 5 Hidden Markov Model Aleppo University Faculty of technical engineering."— Presentation transcript:

Similar presentations

About project

Feedback