Download presentation
Presentation is loading. Please wait.
Published byBertram Fields Modified over 8 years ago
1
Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 6: Mutations and variational inference
2
Genome Evolution. Amos Tanay 2009 Bayesian inference vs. Maximum likelihood Maximum likelihood estimator Introducing prior beliefs on the process (Alternatively: think of virtual evidence) Computing posterior probabilities on the parameters Parameter Space MLE No prior beliefs Parameter Space PME Beliefs MAP
3
Genome Evolution. Amos Tanay 2009 KL-divergence Entropy (Shannon) Kullback-leibler divergence KL Shannon Not a metric!!
4
Genome Evolution. Amos Tanay 2009 Expectation-Maximization Relative entropy>=0 EM maximization Dempster
5
Genome Evolution. Amos Tanay 2009 Expectation-Maximization Decompose over alignment positions Group terms with the same free parameter (weights are essentially the posterior of the parent child – prove it!
6
Genome Evolution. Amos Tanay 2009 Sampling is the a natural way to do approximate inference Marginal Probability (integration over all space) Marginal Probability (integration over A sample) Terminology: Do you know how to define these by now? Inference Parameter learning Likelihood Total probability/Marginal probability Exact inference/Approximate inference
7
Genome Evolution. Amos Tanay 2009 Sources of mutations Mistakes –Replication errors (point mutations) –Recombination errors (mainly indels) Endogenous DNA Damage –Spontaneous base damage: Deaminations, depurinations –Byproducts of metabolism: Oxygen radicals that damage DNA Exogenous DNA Damage –UV –Chemicals All of these mechanisms cross talk with the surrounding sequence
8
Genome Evolution. Amos Tanay 2009 DNA polymerases replicating DNA A good polymerase domain has a misincorporation rate of 10 -5 (1/100,000) Any misincorps are clipped off with 99% efficiency by the “proofreading” activity of the polymerase Further mismatch repair that works in 99.9% of the case bring the fidelity of the main Polymerases to 10 -10 Some dedicated polymerases are not as accurate!
9
Genome Evolution. Amos Tanay 2009 Recombination errors A consequence of partial homology between different chromosomal loci Can introduce translocations if the matching sequences are on different chromsomes Can introduce inversion or deletion if the matching sequences are on the same chromsome Can generate duplication or deletions if the matching sequences are in tandem
10
Genome Evolution. Amos Tanay 2009 Endogenous DNA damage: Deamination of Cytosines * Thymine has CH3 here NH H H H O N N 2 H* H H O N N O deNHn Cytosine Uracil H
11
Genome Evolution. Amos Tanay 2009 Deamination of Cytosine creates a G-U mismatch Easy to tell that U is wrong Deamination of Cytosine creates a G-T mismatch Not easy to tell which base is the mutation. About 50% of the time the G is “corrected” to A resulting in a mutation
12
Genome Evolution. Amos Tanay 2009 UV irradiation generate primarily Thymine dimers: Exogenous DNA damage Chemicals - Food Benzopyrene – smoke UV radiations (Sunlight) Ionizing raidation radon Cosmic rays X rays
13
Genome Evolution. Amos Tanay 2009 Direct repair Repairing DNA damage
14
Genome Evolution. Amos Tanay 2009 Thymine Dimers can be corrected by a direct repair mechanism Photon
15
Genome Evolution. Amos Tanay 2009 Deaminated bases are repaired by a base excision mechanism. BER
16
Genome Evolution. Amos Tanay 2009 Spontaneously occuring abasic sites are repaired by the same mechanism BER
17
Genome Evolution. Amos Tanay 2009 Dimeric bases and bulky lesions, e.g., large chemical adducts are repaired by Nucleotide excision repair NER
18
Genome Evolution. Amos Tanay 2009 Adaptive mutations: Cairns et al. 88 Experimental system: lacz frameshift Luria-Delbruk’s observation The experiment suggests adaptive mutations
19
Genome Evolution. Amos Tanay 2009 The “Mutator” paradigm: Ability to switch to the mutator phenotype depends on particular DNA repair mechanisms (Double Strand Break repair in E. Coli) Mutator phenotype is suggested to be important in pathogenesis, antibiotic resistance, and in cancer Species occasionally change (adaptively or even by drift) their repair policy/efficiency The resulted substitution landscape must be very complex
20
Genome Evolution. Amos Tanay 2009 Dynamic Bayesian Networks 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Synchronous discrete time process T=1 T=2T=3 T=4 T=5 Conditional probabilities
21
Genome Evolution. Amos Tanay 2009 Context dependent Markov Processes AAACAAGAA Context determines A markov process rate matrix Any dependency structure make sense, including loops AAA C When context is changing, computing probabilities is difficult. Think of the hidden variables as the trajectories Continuous time Bayesian Networks Koller-Noodleman 2002 1234
22
Genome Evolution. Amos Tanay 2009 Modeling simple context in the tree: PhyloHMM Siepel-Haussler 2003 h pai j h i j-1 hijhij h pai j h i j-1 hijhij h i j+! h pai j+! h pai j-1 h k j-1 hkjhkj h k j+1 Heuristically approximating the Markov process? Where exactly it fails?
23
Genome Evolution. Amos Tanay 2009 Log-likelihood to Free Energy We have so far worked on computing the likelihood: Better: when q a distribution, the free energy bounds the likelihood: Computing likelihood is hard. We can reformulate the problem by adding parameters and transforming it into an optimization problem. Given a trial function q, define the free energy of the model as: The free energy is exactly the likelihood when q is the posterior: D(q || p(h|s)) Likelihood
24
Genome Evolution. Amos Tanay 2009 Energy?? What energy? In statistical mechanics, a system at temperature T with states x and an energy function E(x) is characterized by Boltzman’s law: If we think of P(h|s, ): Given a model p(h,s|T) (a BN), we can define the energy using Boltzman’s law Z is the partition function:
25
Genome Evolution. Amos Tanay 2009 Free Energy and Variational Free Energy The Helmoholtz free energy is defined in physics as: The average energy is: The variational transformation introduce trial functions q(h), and set the variational free energy (or Gibbs free energy) to: This free energy is important in statistical mechanics, but it is difficult to compute, as our probabilistic Z (= p(s)) The variational entropy is: And as before:
26
Genome Evolution. Amos Tanay 2009 Solving the variational optimization problem So instead of computing p(s), we can search for q that optimizes the free energy This is still hard as before, but we can simplify the problem by restricting q (this is where the additional degrees of freedom become important) Maxmizing U?Maxmizing H? Focus on max configurationsSpread out the distribution
27
Genome Evolution. Amos Tanay 2009 Simplest variational approximation: Mean Field Let’s assume complete independence among r.v.’s posteriors: Under this assumption we can try optimizing the q i – (looking for minimal energy!) Maxmizing U?Maxmizing H? Focus on max configurationsSpread out the distribution
28
Genome Evolution. Amos Tanay 2009 Mean Field Inference We optimize iteratively: Select i (sequentially, or using any method) Optimize q i to minimize F MF (q 1,..,q i,…,q n ) while fixing all other qs Terminate when F MF cannot be improved further Remember: F MF always bound the likelihood q i optimization can usually be done efficiently
29
Genome Evolution. Amos Tanay 2009 Mean field for a simple-tree model Just for illustration, since we know how solve this one exactly: We select a node and optimize its q i while making sure it is a distribution: To ease notation, assume the left (l) and right (r) children are hidden The energy decomposes, and only few terms are affected:
30
Genome Evolution. Amos Tanay 2009 Mean field for a simple-tree model Just for illustration, since we know how solve this one exactly: We select a node and optimize its q i while making sure it is a distribution:
31
Genome Evolution. Amos Tanay 2009 Mean field for a phylo-hmm model Now we don’t know how to solve this exactly, but MF is still simple: hjihji h j-1 i h j-1 pai h j pai hjlhjl h j+1 l hjrhjr h j+1 r h j+1 i h j pai h j-1 r
32
Genome Evolution. Amos Tanay 2009 Mean field for a phylo-hmm model Now we don’t know how to solve this exactly, but MF is still simple: hjihji h j-1 i h j-1 pai h j pai hjlhjl h j+1 l hjrhjr h j+1 r h j+1 i h j pai h j-1 r As before, the optimal solution is derived by making logq i equals the sum of affected terms:
33
Genome Evolution. Amos Tanay 2009 Because the MF trial function is very crude Simple Mean Field is usually not a good idea Why? For example, we said before that the joint posteriors cannot be approximated by independent product of the hidden variables posteriors ACA C A/C
34
Genome Evolution. Amos Tanay 2009 Exploiting additional structure The approximation specify independent distributions for each loci, but maintain the tree dependencies. We can greatly improve accuracy by generalizing the mean field algorithm using larger building blocks We now optimize each tree q separately, given the current other tree potentials. The key point is that optimizing for any given tree is efficient: we just use a modified up-down algorithm
35
Genome Evolution. Amos Tanay 2009 Tree based variational inference Each tree is only affected by the tree before and the tree after:
36
Genome Evolution. Amos Tanay 2009 Tree based variational inference We got the same functional form as we had for the simple tree, so we can use the up-down algorithm to optimize q j.
37
Genome Evolution. Amos Tanay 2009 Chain cluster variational inference We can use any partition of a BN to trees and derive a similar MF algorithm For example, instead of trees we can use the Markov chains in each species What will work better for us? Depends on the strength of dependencies at each dimension – we should try to capture as much “dependency” as possible
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.