Lecture 6: Mutations and variational inference

Lecture 6: Mutations and variational inference
Genome evolution Lecture 6: Mutations and variational inference

Bayesian inference vs. Maximum likelihood
Maximum likelihood estimator Introducing prior beliefs on the process (Alternatively: think of virtual evidence) Computing posterior probabilities on the parameters No prior beliefs Parameter Space PME Beliefs MAP MLE Parameter Space

KL-divergence Entropy (Shannon) Kullback-leibler divergence
Not a metric!! KL

Expectation-Maximization
Relative entropy>=0 EM maximization Dempster

Expectation-Maximization
Decompose over alignment positions Group terms with the same free parameter (weights are essentially the posterior of the parent child – prove it!

Terminology: Do you know how to define these by now?
Inference Parameter learning Likelihood Total probability/Marginal probability Exact inference/Approximate inference Sampling is the a natural way to do approximate inference Marginal Probability (integration over A sample) Marginal Probability (integration over all space)

Sources of mutations Mistakes Endogenous DNA Damage
Replication errors (point mutations) Recombination errors (mainly indels) Endogenous DNA Damage Spontaneous base damage: Deaminations, depurinations Byproducts of metabolism: Oxygen radicals that damage DNA Exogenous DNA Damage UV Chemicals All of these mechanisms cross talk with the surrounding sequence

DNA polymerases replicating DNA A good polymerase domain has a misincorporation rate of 10-5 (1/100,000) Any misincorps are clipped off with 99% efficiency by the “proofreading” activity of the polymerase Further mismatch repair that works in 99.9% of the case bring the fidelity of the main Polymerases to 10-10 Some dedicated polymerases are not as accurate!

Recombination errors A consequence of partial homology between different chromosomal loci Can introduce translocations if the matching sequences are on different chromsomes Can introduce inversion or deletion if the matching sequences are on the same chromsome Can generate duplication or deletions if the matching sequences are in tandem

Endogenous DNA damage: Deamination of Cytosines
NH H O N 2 H* deNHn Cytosine Uracil *Thymine has CH3 here

Deamination of Cytosine creates a G-U mismatch
Easy to tell that U is wrong Deamination of Cytosine creates a G-T mismatch Not easy to tell which base is the mutation. About 50% of the time the G is “corrected” to A resulting in a mutation

Exogenous DNA damage UV irradiation generate primarily Thymine dimers:
Chemicals - Food Benzopyrene – smoke UV radiations (Sunlight) Ionizing raidation radon Cosmic rays X rays

Repairing DNA damage Direct repair

Thymine Dimers can be corrected by a direct repair mechanism
Photon

BER Deaminated bases are repaired by a base excision mechanism.

BER Spontaneously occuring abasic sites are repaired
by the same mechanism

NER Dimeric bases and bulky lesions, e.g., large chemical adducts
are repaired by Nucleotide excision repair

Adaptive mutations: Cairns et al. 88
Experimental system: lacz frameshift The experiment suggests adaptive mutations Luria-Delbruk’s observation

The “Mutator” paradigm:
Ability to switch to the mutator phenotype depends on particular DNA repair mechanisms (Double Strand Break repair in E. Coli) Mutator phenotype is suggested to be important in pathogenesis, antibiotic resistance, and in cancer Species occasionally change (adaptively or even by drift) their repair policy/efficiency The resulted substitution landscape must be very complex

Dynamic Bayesian Networks
Conditional probabilities 1 3 2 4 T=1 T=2 T=3 T=4 T=5 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 Synchronous discrete time process

Context dependent Markov Processes
1 2 3 4 Context determines A markov process rate matrix Any dependency structure make sense, including loops A A A C A A G A A When context is changing, computing probabilities is difficult. Think of the hidden variables as the trajectories A A A C Continuous time Bayesian Networks Koller-Noodleman 2002

Modeling simple context in the tree: PhyloHMM
hpaij Heuristically approximating the Markov process? Where exactly it fails? hij-1 hij hpaij-1 hpaij hpaij+! hkj-1 hkj hkj+1 hij-1 hij hij+! Siepel-Haussler 2003

Log-likelihood to Free Energy
We have so far worked on computing the likelihood: Computing likelihood is hard. We can reformulate the problem by adding parameters and transforming it into an optimization problem. Given a trial function q, define the free energy of the model as: The free energy is exactly the likelihood when q is the posterior: Better: when q a distribution, the free energy bounds the likelihood: D(q || p(h|s)) Likelihood

Energy?? What energy? In statistical mechanics, a system at temperature T with states x and an energy function E(x) is characterized by Boltzman’s law: Z is the partition function: Given a model p(h,s|T) (a BN), we can define the energy using Boltzman’s law If we think of P(h|s,q):

Free Energy and Variational Free Energy
The Helmoholtz free energy is defined in physics as: This free energy is important in statistical mechanics, but it is difficult to compute, as our probabilistic Z (= p(s)) The variational transformation introduce trial functions q(h), and set the variational free energy (or Gibbs free energy) to: The average energy is: The variational entropy is: And as before:

Solving the variational optimization problem
Maxmizing U? Maxmizing H? Focus on max configurations Spread out the distribution So instead of computing p(s), we can search for q that optimizes the free energy This is still hard as before, but we can simplify the problem by restricting q (this is where the additional degrees of freedom become important)

Simplest variational approximation: Mean Field
Maxmizing U? Maxmizing H? Focus on max configurations Spread out the distribution Let’s assume complete independence among r.v.’s posteriors: Under this assumption we can try optimizing the qi – (looking for minimal energy!)

Mean Field Inference We optimize iteratively:
Select i (sequentially, or using any method) Optimize qi to minimize FMF(q1,..,qi,…,qn) while fixing all other qs Terminate when FMF cannot be improved further Remember: FMF always bound the likelihood qi optimization can usually be done efficiently

Mean field for a simple-tree model
Just for illustration, since we know how solve this one exactly: We select a node and optimize its qi while making sure it is a distribution: The energy decomposes, and only few terms are affected: To ease notation, assume the left (l) and right (r) children are hidden

Mean field for a simple-tree model
Just for illustration, since we know how solve this one exactly: We select a node and optimize its qi while making sure it is a distribution:

Mean field for a phylo-hmm model
Now we don’t know how to solve this exactly, but MF is still simple: hj-1pai hjpai hjpai hj-1i hji hj+1i hj-1r hjl hj+1l hj-1r hjr hj+1r

Mean field for a phylo-hmm model
Now we don’t know how to solve this exactly, but MF is still simple: hj-1pai hjpai hjpai hj-1i hji hj+1i As before, the optimal solution is derived by making logqi equals the sum of affected terms: hj-1r hjl hj+1l hj-1r hjr hj+1r

Simple Mean Field is usually not a good idea
Why? Because the MF trial function is very crude For example, we said before that the joint posteriors cannot be approximated by independent product of the hidden variables posteriors A/C A/C A/C A C A C

Exploiting additional structure
We can greatly improve accuracy by generalizing the mean field algorithm using larger building blocks The approximation specify independent distributions for each loci, but maintain the tree dependencies. We now optimize each tree q separately, given the current other tree potentials. The key point is that optimizing for any given tree is efficient: we just use a modified up-down algorithm

Tree based variational inference
Each tree is only affected by the tree before and the tree after:

Tree based variational inference
We got the same functional form as we had for the simple tree, so we can use the up-down algorithm to optimize qj.

Chain cluster variational inference
We can use any partition of a BN to trees and derive a similar MF algorithm For example, instead of trees we can use the Markov chains in each species What will work better for us? Depends on the strength of dependencies at each dimension – we should try to capture as much “dependency” as possible

Lecture 6: Mutations and variational inference

Similar presentations

Presentation on theme: "Lecture 6: Mutations and variational inference"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 6: Mutations and variational inference

Similar presentations

Presentation on theme: "Lecture 6: Mutations and variational inference"— Presentation transcript:

Similar presentations

About project

Feedback