Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome evolution: a computational approach Lecture 1: Modern challenges in evolution. Markov processes. Amos Tanay, Ziskind 204, ext 3579 עמוס תנאי

Similar presentations


Presentation on theme: "Genome evolution: a computational approach Lecture 1: Modern challenges in evolution. Markov processes. Amos Tanay, Ziskind 204, ext 3579 עמוס תנאי"— Presentation transcript:

1 Genome evolution: a computational approach Lecture 1: Modern challenges in evolution. Markov processes. Amos Tanay, Ziskind 204, ext 3579 עמוס תנאי amos.tanay@weizmann.ac.il http://www.wisdom.weizmann.ac.il/~atanay/GenomeEvo/

2 The Genome exon intergenic A C G T Triplet Code intron

3 Humans and Chimps Where are the “important” differences? How did they happen? ~5-7 million years 3X10 9 {ACGT} 3X10 9 {ACGT} Genome alignment

4 Marmoset Macaque Orangutan Chimp Human Baboon Gibbon Gorilla 0.5% 0.8% 1.5% 3% 9% 1.2% Where are the “important” differences? How did new features were gained?

5 Antibiotic resistance: Staphylococcus aureus Timeline for the evolution of bacterial resistance in an S. aureus patient (Mwangi et al., PNAS 2007) Skin based killed 19,000 people in the US during 2005 (more than AIDS) Resistance to Penicillin: 50% in 1950, 80% in 1960, ~98% today 2.9MB genome, 30K plasmid How do bacteria become resistant to antibiotics? Can we eliminate resistance by better treatment protocols, given understanding of the evolutionary process?

6 Dapto.OxaciliRifampiVanco. 0.010.750.012120/7 0.052516420/9 0.050.751661/10 1.01.51686/10 1.00.7516813/10 Mutations 1 2 3 4-6 7 8 9 10 11 12 13 14 15…18 Resistance to Antibiotics S. Aureus got found just few “right” mutations and survived multi-antibiotics Ultimate experiment: sequence the entire genome of the evolving S. aureus

7 Yeast Genome duplication The budding yeast S. cerevisiae genome have extensive duplicates We can trace a whole genome duplication by looking at yeast species that lack the duplicates (K. waltii, A. gosypii) Only a small fraction (5%) of the yeast genome remain duplicated

8 How can an organism tolerate genome duplication and massive gene loss? Is this critical in evolving new functionality?

9 “Junk” and ultraconservation 12MB~6000 genes 100MB ~20,000 genes 3GB ~27,000 genes Baker’s yeast The worm c.elegans Humans 1 cell ~1000 cells ~50 trillions cells

10 From: Lynch 2007

11 ENCODE Data exon intronintergenic intron

12 Grand unifying theory of everything Biology (phenotype) Genomes (genotype) Strings of A,C,G,T (Total DNA on earth: A lot, but only that much)

13 Evolution: bird’s eyes view Geography (Communication barriers) Ecology (many species) Environment (Changing fitness) recombination Species A Species B mutation selection Fitness

14 Course outline Probabilistic models Inference Parameter estimation Genome structure Mutations Population Inferring Selection (Probability, Calculus/Matrix theory, some graph theory, some statistics)

15 Probabilistic models Inference Parameter estimation Genome structure Mutations Population Selection Models: Markov chains discrete continuous Bayesian networks Factor Graphs Inference: Dynamic programming Sampling Variational methods Generalized Belief propagation Parameter estimation: EM, function optimization Introduction to the human genome Point mutations Insertion/Deletions Repeats Basic population genetics Drift/Fitness/Selection Protein coding genes Transcription factor binding sites RNA Networks

16 Things you need to know or catch up with: Graph theory –Basic definitions,Trees, Cycles Matrix algebra –Basic definitions, Eigenvalues Probability –Basic discrete probability, std distributions What you’ll learn: Modern methods for inference in complex probabilistic models in general Intro to genome organization and key concepts in evolution Inferring selection using comparative genomics Books: Graur and Li, Molecular Evolution Lynch, Origin of genome architecture Hartl and Clark, Population genetics Durbin et al. Biological sequence analysis Karlin and Taylor, Markov Processes Freidman and Koller draft textbook (handouts) Papers as we go along.. N. Friedman D. Koller BN and beyond

17 Course duties 5 exercises, 40% of the grade –Mainly theoretical, math questions, usually ~120 points to collect –Trade 1 exercise for ppt annotations (extensive in-line notes) 1 Genomic exercise (in pairs) for 10% of the grade –Compare two genomes of your choice: mammals, worms, flies, yeasts, bacteria, plants Exam: 60% (110% in total)

18 Model Parameters Model Parameters Genome sequence 1 Genome sequence 2 Genome sequence 3 (0) Modeling the genome sequences Probabilistic modeling: P(data |  ) Using few parameters to explain/regenerate most of the data Hidden variables make model explicit and mechanistic (1)Inferring ancestral genomes Based on some model compute the distribution of ancestral genomes (2) Learning an evolutionary model Using extant genomes, learn a “reasonable” model Ancestral Genome sequence 1 Ancestral Genome sequence 2

19 Model Parameters Genome sequence 1 Genome sequence 2 Genome sequence 3 (1)Decoding the genome Genomic regions with different function evolve differently Learn to read the genome through evolutionary modelling (2)Understanding the evolutionary process The model parameters describe evolution (3) Inferring phylogenies Which tree structure explain the data best? Is it a tree? Ancestral Genome sequence 1 Ancestral Genome sequence 2

20 Probabilities Our probability space: –DNA/Protein sequences: {A,C,G,T} –Time/populations Queries: –If a locus have an A at time t, what is the chance it will be C at time t+1? –If a locus have an A in an individual from population , what is the chance it will be C in another individual from the same population? –What is the chance to find the motif ACGCGT anywhere in a random individual of population  ? what is the chance it will remain the same after 2m years? A B Conditional Probability: Chain Rule: Bayes Rule:

21 Random Variables & Notation Val(X) – set of possible values of RV X Upper case letters denote RVs (e.g., X, Y, Z) Upper case bold letters denote set of RVs (e.g., X, Y) Lower case letters denote RV values (e.g., x, y, z) Lower case bold letters denote RV set values (e.g., x)

22 Stochastic Processes and Stationary Distributions Stationary Model Process Model t

23 t 10234 0123 Random walk BA C D Poisson process T=1T=2T=3T=4T=5 Discrete time Continuous time Markov chain Brownian motion

24 The Poisson process Events are occurring interpedently in disjoint time intervals : an r.v. that counts the number of events up to time t. Assume: probability of two or more events in time h is Now:.

25 Probability of m events at time t: The Poisson process

26 Solving the recurrence: The Poisson process

27 Markov chains General Stochastic process: The Markov property: A set of states: Finite or Countable. (e.g., Integers, {A,C,G,T}) Discrete time: T=0,1,2,3,…. Transition probability Stationary transition probabilities Stationary process One step transitions

28 AB p ba p ab 1-p ba 1-p ab AB AB AB AB T=1 T=2 T=3 T=4 Markov chains AGCT AGCT AGCT AGCT 4 Nucleotides The loaded coin 20 Amino Acids A R N D C E Q G H I L K M F P S T W Y V

29 Markov chains Transition matrix P: A discrete time Markov chain is completely defined given an initial condition and a probability matrix. The Markov chain graph G is defined on the states. We connect (a,b) whenever P ab >0 Distribution after T time steps given x as an initial condition Matrix power

30 Spectral decomposition Right,left eigenvector: When an eigen-basis exists We can find right eigenvectors: And left eigenvectors: With the eigenvalue spectrum: Which are bi-orthogonal: And define the spectral decomposition: A B A B A T=1T=2T=3

31 To compute transition probabilities: O(|E|)*T ~ O(N 2 )*T per initial condition T matrix multiplications to preprocess for time T Using spectral decomposition: O(Spectral pre-process) + 2 matrix multiplications per condition Spectral decomposition

32 Spec(P) = P’s eignvalues,       = largest, always = 1. A Markov Chain is irreducible if its underlying graph is connected. In that case there is a single eigenvalue that equals 1. What does the left eigenvector corresponding to 1 represent? Convergence Fixed point: 2 = second largest eigenvalue. Controlling the rate of process convergence

33 Continuous time Conditions on transitions: Theorem: exists (may be infinite) exists and finite Think of time steps that are smaller and smaller Markov Kolmogorov

34 Rates and transition probabilities The process’s rate matrix: Transitions differential equations (backward form):

35 Matrix exponential The differential equation: Series solution: 1-path2-path3-path4-path5-path Summing over different path lengths:

36 Computing the matrix exponential

37 Series methods: just take the first k summands reasonable when ||A||<=1 if the terms are converging, you are ok can do scaling/squaring: Eigenvalues/decomposition: good when the matrix is symmetric problems when having similar eigenvalues Multiple methods with other types of B (e.g., triangular)

38 Modeling: simple case Maximum likelihood model: Modeling Inference Learning Genome 1 Genome 2 AGCAACAAGTAAGGGAAACTACCCAGAAAA…. AGCCACATGTAACGGTAATAACGCAGAAAA…. Alignment Statistics A G C T AGCTAGCT

39 Modeling: simple case Modeling Inference Learning Genome 1 Genome 2 AGCAACAAGTAAGGGAAACTACCCAGAAAA…. AGCCACATGTAACGGTAATAACGCAGAAAA…. Alignment Statistics A G C T AGCTAGCT (t=1)

40 Q,tQ,t’ Modeling: but is it kosher? Q,t+t’

41 Symmetric processes Definition: we call a Markov process symmetric if its rate matrix is symmetric: What would a symmetric process converge to? Reversing time: whiteboard/ exercise

42 Reversibility Definition: A reversible Markov process is one for which: Claim: A Markov process is reversible iff such that: If this holds, we say the process is in detailed balance. ii jj q ji q ij whiteboard/ exercise Time: t  s ijji

43 Reversibility Claim: A Markov process is reversible iff we can write: where S is a symmetric matrix. whiteboard/ exercise Q,tQ,t’ Q,t Q,t+t’


Download ppt "Genome evolution: a computational approach Lecture 1: Modern challenges in evolution. Markov processes. Amos Tanay, Ziskind 204, ext 3579 עמוס תנאי"

Similar presentations


Ads by Google