How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model.

How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model

How to find foreign genes? Markov Models A C G T AAA 0.10 0.15 0.40 0.35 AAC 0.25 0.45 0.25 0.05 AAG 0.25 0.20 0.30 0.25 AAT0.25 0.20 0.30 0.25 ACA0.15 0.20 0.25 0.40... TTG0.20 0.50 0.05 0.25 TTT0.10 0.55 0.25 0.10 Candidate gene AAAACAA… 0.10 3rd order Markov model

Markov Chains A traffic light considered as a sequence of states A trivial Markov chain – the transition probability between the states is always 1 P gy = 1 P yr = 1 P rg = 1

If we watch our traffic light, it will emit a string of states A traffic light considered as a sequence of states Markov Chains In the case of a simple Markov model, the state labels (e.g. green, red, yellow) are the observable outputs of the process

Markov Chains An occasionally malfunctioning traffic light!! The Markov property is that the probability of observing next a given future state depends only on the current state! P gy = 1 P yr =.9 P rg =.85 P ry =.15 P yg =.10

Markov Chains The Markov Property a st = P(x i = t | x i-1 = s) English Translation: The transition probability a st from state s to state t… …is equal to the probability that the ith state was t.. given that that the immediately proceeding state (x i-1 ) was s This is a form of conditional probability

Markov Chain Now we can consider the probability of an observed sequence! An occasionally malfunctioning traffic light!!

Markov Chains What is the probability of chain of events x? P(x) = P(x L, x L-1, …,x 1 ) English Translation: The probability of observing sequence of states x......is equal to the probability that the X L th state was whatever AND the X L-1 th state was whatever else, AND etc., etc. This is a form of joint probability

Markov Chains What is the probability of chain of events x? P(x) = P(x L, x L-1, …,x 1 ) = P(x L | x L-1, …,x 1 ) P(x L-1 | x L-2, …,x 1 )... P(x 1 ) This is because P(X,Y) = P(X|Y) * P(Y) English Translation: The probability of events X AND Y happening is equal to the probability of X happening given that Y has already happened, times the probability of event Y

Markov Chains What is the probability of chain of events x? P(x) = P(x L | x L-1, …,x 1 ) P(x L-1 | x L-2, …,x 1 )... P(x 1 ) But remember the key property of a Markov Chain is that probability of symbol x i depends ONLY on the value of preceding symbol X i-1 !! Therefore: P(x) = P(x L | x L-1 ) P(x L-1 | x L-2 )... P(x 2 |x 1 ) P(x 1 ) P(x) = P(x 1 ) a x i-1 x i  L i=2

Markov Chains How about nucleic acid sequences? No reason why nucleic acid sequences found in an organism cannot be modeled using Markov chains A C G T

Markov Model What do we need to probabilistically model DNA sequences? A C G T States Transition probabilities The states are the same for all organisms, so the transition probabilities are the model parameters we need to estimate

Parameter estimation AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the Markov Model This is a maximum likelihood approach to parameter estimation. Such procedures maximize the overall probability of the training set data.

Markov Model Which model best explains a newly observed sequence? A C G T Each organism will have different transition probabilities parameters, so you can ask “was the sequence more likely to be generated by model A or model B?” A C G T Organism A Organism B

P(x|model A) P(x|model B) S(x) = log A commonly used metric for discrimination using Markov Chains is the Log-Odds ratio Markov Model Which model best explains a newly observed sequence? i =1 L  a A x i-1 x i a B x i-1 x i log

How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model.

Similar presentations

Presentation on theme: "How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model.

Similar presentations

Presentation on theme: "How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model."— Presentation transcript:

Similar presentations

About project

Feedback