Download presentation
Presentation is loading. Please wait.
1
The Human Genome (Harding & Sanger) *50.000 *20 globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3 bp Myoglobin globin ATTGCCATGTCGATAATTGGACTATTTGGA30 bp aa DNA: Protein:
2
Schedule Projects in substitution models, phylogenies, networks, grammars, RNA structure, signals, your choice Course should appeal to combinatorics, probability theory, statistics, algorithmics, software design, EMAILS: lyngsoe@stats.ox.ac.uk hein@stats.ox.ac.uk mitchell@stats.ox.ac.uk@stats.ox.ac.ukhein@stats.ox.ac.uk mitchell@stats.ox.ac.uk Models of substitution I : Basic Models 15.10 Models of substitution II : Complex Models 16.10 Models of substitution III : Advanced Questions 22.10 A T Finding Recombinations in Sequences 4.12 Phylogenies I: Combinatorics 23.10 Phylogenies II: Parsimony 29.10 Phylogenies III: Likelihood 30.10 Phylogenies IV: Inference 5.11 Alignment Algorithms I Optimisation 19.11 (Rune) Alignment Algorithms II Statistical Inference 20.11 (Rune) ACT-T -GTCT Stochastic Grammars and their Biological Applications: Hidden Markov Models 26.11 Finding Signals in Sequences 27.11 RNA structures (Rune) 3.12 Summer projects: http://www.stats.ox.ac.uk/research/genome/projects Networks I: Dynamics 6.11 Networks II: Inference 12.11 Networks III: Evolution 13.11 2 1 3 45
3
ACGTC Central Problems: History cannot be observed, only end products. Even if History could be observed, the underlying process couldn’t !! ACGCC AGGCC AGGCT AGGTT ACGTC ACGCC AGGCC AGGCT AGGGC AGGCT AGGTT AGTGC
4
Some Definitions State space – a set often corresponding of possible observations ie {A,C,G,T}. A random variable, X can take values in the state space with probabilities ie P{X=A} = ¼. The value taken often indicated by small letters - x Stochastic Process is a set of time labeled stochastic variables X t ie P{X 0 =A, X 1 =C,.., X 5 =G} =.00122 Time can be discrete or continuous, in our context it will almost always mean natural numbers, N {0,1,2,3,4..}, or an interval on the real line, R. Time Homogeneity – the process is the same for all t. Markov Property: ie
5
2) Processes in different positions of the molecule are independent, so the probability for the whole alignment will be the product of the probabilities of the individual patterns. Simplifying Assumptions I Data: s1=TCGGTA,s2=TGGTT 1) Only substitutions. s1 TCGGTA s1 TCGGA s2 TGGT-T s2 TGGTT TGGTT TCGGTA a - unknown Biological setup T T a1a1 a2a2 a3a3 a4a4 a5a5 G G T T C G G A Probability of Data
6
Simplifying Assumptions II 3) The evolutionary process is the same in all positions 4) Time reversibility: Virtually all models of sequence evolution are time reversible. I.e. π i P i,j (t) = π j P j,i (t), where π i is the stationary distribution of i and P t (i->j) the probability that state i has changed into state j after t time. This implies that = a N1N1 N2N2 l 2 +l 1 l1l1 l2l2 N2N2 N1N1
7
Simplifying assumptions III 6) The rate matrix, Q, for the continuous time Markov Chain is the same at all times (and often all positions). However, it is possible to let the rate of events, r i, vary from site to site, then the term for passed time, t, will be substituted by r i *t. 5) The nucleotide at any position evolves following a continuous time Markov Chain. T O A C G T F A -(q A,C +q A,G +q A,T ) q A,C q A,G q A,T R C q C,A -(q C,A +q C,G +q C,T ) q C, G q C,T O G q G,A q G,C -(q G,A +q G,C +q G,T ) q G,T M T q T,A q T,C q T,G -(q T,A +q T,C +q T,G ) P i,j (t) continuous time markov chain on the state space {A,C,G,T}. Q - rate matrix: t1t1 t2t2 C C A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.