Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!

Similar presentations


Presentation on theme: "Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!"— Presentation transcript:

1 Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!

2 Course outline Probabilistic models Inference Parameter estimation Genome structure Mutations Population Inferring Selection (Probability, Calculus/Matrix theory, some graph theory, some statistics) CT Markov Chains Simple Tree Models HMMs and variants Dynamic Programming EM

3 What we can do so far (not much..): Given a set of genomes (sequences), phylogeny Align them to generate a set of loci (not covered) Estimate a ML simple tree model (EM) Infer ancestral sequences posteriors Inferring a phylogeny is generally hard..but quite trivial given entire genomes that are evolutionary close to each other Multi alignment is quite difficult when ambiguous.. Again easy when genomes are similar EM is improving and converging Tend to stuck at local maxima Initial condition is critical For simple tree - not a real problem Inference is easy and accurate for trees

4 Loci independence does not make sense hijhij h i j+1 h i j-1 Flanking effects: Selection on codes hijhij h i j+1 h i j-1 h pai j h pai j+1 h pai j-1 h i j+2 h pai j+2 Regional effects: (CpG deamination) (G+C content) (Transcription factor binding sites)

5 Bayesian Networks Defining the joint probability for a set of random variables given: 1)a directed acyclic graph 2)Conditional probabilities Claim: The Up-Down algorithm is correct for trees Proof: Given a node, the distributions of the evidence on two subtrees are independent… Claim: Proof: we use a topological order on the graph (what is this?) Claim/Definition: In a Bayesian net, a node is independent of its non descendents given its parents (The Markov property for BNs) Definition: the descendents of a node X are those accessible from it via a directed path whiteboard/ exercise

6 Stochastic Processes and Stationary Distributions Stationary Model Process Model t

7 Dynamic Bayesian Networks 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Synchronous discrete time process T=1 T=2T=3 T=4 T=5 Conditional probabilities

8 Context dependent Markov Processes AAACAAGAA Context determines A markov process rate matrix Any dependency structure make sense, including loops AAA C When context is changing, computing probabilities is difficult. Think of the hidden variables as the trajectories Continuous time Bayesian Networks Koller-Noodleman 2002 1234

9 Modeling simple context in the tree: PhyloHMM Siepel-Haussler 2003 h pai j h i j-1 hijhij h pai j h i j-1 hijhij h i j+! h pai j+! h pai j-1 h k j-1 hkjhkj h k j+1 Heuristically approximating a CTBN? Where exactly it fails? whiteboard/ exercise

10 So why inference becomes hard (for real, not in worst case and even in a crude heuristic like phylo-hmm)? h pai j h i j-1 hijhij h i j+! h pai j+! h pai j-1 h k j-1 hkjhkj h k j+1 We know how to work out the chains or the trees Together the dependencies cannot be controlled (even given its parents, a path can be found from each node to everywhere.

11 General approaches to approximate inference Sampling: Variational methods: Generalized message passing: Marginal Probability (integration over all space) Marginal Probability (integration over A sample) P(h|s)Q1Q1 Q2Q2 Q3Q3 Q4Q4 Optimize q i Exact algorithms: (see Pearl 1988 and beyond) – out of the question in our case

12 Sampling from a BN Naively: If we could sample from Pr(h,s) then: Pr(s) ~ (#samples with s)/(# samples) Forward sampling: use a topological order on the network. Select a node whose parents are already determined sample from its conditional distribution 2 3 1 whiteboard/ exercise How to sample from the CPD? 456 789

13 Focus on the observations Naïve sampling is terribly inefficient, why? whiteboard/ exercise A word on sampling error Why don’t we constraint the sampling to fit the evidence s? 2 3 1 456 789 Two tasks: P(s) and P(f(h)|s), how to approach each/both? This can be done, but we no longer sample from P(h,s), and not from P(h|s) (why?)

14 Likelihood weighting Likelihood weighting: weight = 1 use a topological order on the network. Select a node whose parents are already determined if no evidence exists: sample from its conditional distribution else: weight *= P(x i |pax i ), add evidence to sample Report weight, sample Pr(h|s) = (total weights of sample with h)/(total weights) 789 Weight=

15 Generalizing likelihood weighting: Importance sampling f is any function (think 1(h i )) We will use a proposal distribution Q We should have Q(x)>0 whenever P(x)>0 Q should combine or approximate P and f, even if we cannot sample from P (imagine that you like to sample from P(h|s) to recover Pr(h i |s)).

16 Correctness of likelihood weighting: Importance sampling whiteboard/ exercise Sample: Unnormalized Importance sampling: So we sample with a weight: To minimize the variance, use a Q distribution is proportional to the target function: (Think of the variance of f=1 : We are left with the variance of w) whiteboard/ exercise f is any function (think 1(h i ))

17 Normalized Importance sampling whiteboard/ exercise Sample: Normalized Importance sampling: When sampling from P(h|s) we don’t know P, so cannot compute w=P/Q We do know P(h,s)=P(h|s)P(s)=P(h|s)  =P’(h) So we will use sampling to estimate both terms:

18 Normalized Importance sampling How many samples? Biased vs. unbiased estimator whiteboard/ exercise Compare to an ideal Sampler from P(h): The ratio represent how effective your sample was so far: Sampling from P(h|s) could generate posteriors quite rapidly If you estimate Var(w) you know how close your sample is to this ideal The variance of the normalized estimator (not proved):

19 Back to likelihood weighting: Our proposal distribution Q is defined by fixing the evidence and ignoring the CPDs of variable with evidence. It is like forward sampling from a network that eliminated all edges going into evidence nodes The weights are: The importance sampling machinery now translates to likelihood weighting: Unnormalized version to estimate P(s) Unnormalized version to estimate P(h|s) Normalized version to estimate P(h|s) Q forces s Q forces s and h i

20 Likelihood weighting is effective here: But not here: observed unobserved Limitations of forward sampling

21 Markov Chain Monte Carlo (MCMC) We don’t know how to sample from P(h)=P(h|s) (or any complex distribution for that matter) The idea: think of P(h|s) as the stationary distribution of a Reversible Markov chain Find a process with transition probabilities for which: Then sample a trajectory Theorem: (C a counter) Process must be irreducible (you can reach from anywhere to anywhere with p>0) (Start from anywhere!)

22 The Metropolis(-Hastings) Algorithm Why reversible? Because detailed balance makes it easy to define the stationary distribution in terms of the transitions So how can we find appropriate transition probabilities? We want: Define a proposal distribution: And acceptance probability: What is the big deal? we reduce the problem to computing ratios between P(x) and P(y) xy F

23 Acceptance ratio for a BN We must compute min(1,P(Y)/P(X)) (e.g. min(1, Pr(h’|s)/Pr(h|s)) But this usually quite easy since e.g., Pr(h’|s)=Pr(h,s)/Pr(s) We affected only the CPDs of h i and its children Definition: the minimal Markov blanket of a node in BN include its children, Parents and Children’s parents. To compute the ratio, we care only about the values of h i and its Markov Blanket For example, if the proposal distribution changes only one variable h i what would be the ratio? whiteboard/ exercise What is a markov blanket?

24 Gibbs sampling A very similar (in fact, special case of the metropolis algorithm): Start from any state h Iterate: Chose a variable H i Form h t+1 by sampling a new h i from Pr(h i |h t ) This is a reversible process with our target stationary distribution: Gibbs sampling easy to implement for BNs:

25 Sampling in practice How much time until convergence to P? (Burn-in time) Mixing Burn in Sample Consecutive samples are still correlated! Should we sample only every n-steps? We sample while fixing the evidence. Starting from anywere but waiting some time before starting to collect data A problematic space would be loosely connected: whiteboard/ exercise Examples for bad spaces


Download ppt "Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!"

Similar presentations


Ads by Google