Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algebraic Statistics for Computational Biology Lior Pachter and Bernd Sturmfels Ch.5: Parametric Inference R. Mihaescu Παρουσίαση: Aγγελίνα Βιδάλη Αλγεβρικοί.

Similar presentations


Presentation on theme: "Algebraic Statistics for Computational Biology Lior Pachter and Bernd Sturmfels Ch.5: Parametric Inference R. Mihaescu Παρουσίαση: Aγγελίνα Βιδάλη Αλγεβρικοί."— Presentation transcript:

1 Algebraic Statistics for Computational Biology Lior Pachter and Bernd Sturmfels Ch.5: Parametric Inference R. Mihaescu Παρουσίαση: Aγγελίνα Βιδάλη Αλγεβρικοί & Γεωμετρικοί Αλγόριθμοι στη Μοριακή Βιολογία Διδάσκων: Ι. Εμίρης

2 Convenient algebraic structure for stating dynamic programming algorithms: the tropical semiring Tropical arithmetic (Convex hull) (Minkowski sum) The polytope agebra ( P d natural higher-dimensional generalization:

3 Inference From Observed random variables Y 1 = σ 1,…,Y n = σ n we want to infer values for the Hidden random variables Χ 1,…,Χ m : Unknown biological data, i.e.: How do two sequences allign? MAP estimation: given an observation σ 1,…,σ n which is the most probable explanation X 1 =h 1,…, Χ m =h m ? Model parameters give transition probabilities p hσ : hidden state hσ observed state Observation: σ 1,…,σ n : Known biological data

4 Observation: σ 1,…,σ n We want to compute an explanation for the observation: the sequence h 1,…,h m which yields the maximum a prosteriori probability (MAP): We can efficiently compute the marginal probabilities: Hidden Markov Model (HMM)

5 Computation of the marginal probabilities: p σ has the decomposition which gives the “Forward algorithm”. Markov chain: Independent probabilities

6 Viterbi algorithm problem of computing p σ Tropicalization:u ij =-log(p’ ij )v ij =-log(p ij ) We can now efficiently find an explanation h 1,…,h m for the observation σ 1,…,σ n using the recursion: It is again the Forward algorithm.

7 Pair Hidden Markov Model (pHMM) The algebraic statistical model for sequence alignment, known as the pair hidden Markov model, is the image of the map where A n,m is the set of all alignments of the sequences σ 1, σ 2.

8 The Needleman-Wunsch algorithm for finding the shortest path in the alignment graph is the tropicalization of the pair hidden Markov model for sequence allignment. gttta- gt--gc g t g c gttta Example: n=5, m=4 **

9 The polytope propagation algorithm Tropical sum-product algorithm in general fashion. f is the density function for a statistical model. From the d monomials find the one that maximizes Solution: Tropicalization: w i =-logp i & Computation in the ploytope algebra

10 Density function for a statistical model: f(p 1,p 2 )=p 1 3 +p 1 2 p 2 2 +p 1 p 2 2 +p 1 +p 2 4 Find the index j of the monomial that minimizes the function e j. w. Find an explanation Find the index j of the monomial with maximal value Tropicalization: w i =-logp i

11 Explanations are vertices of the Newton Polytope of f p13p13 p11p11 f(p 1,p 2 )=p 1 3 +p 1 2 p 2 2 +p 1 p 2 2 +p 1 +p 2 4 we find a point for each exponent vector of a monomial

12 Normal fan The normal fan partitions the parameter space into regions such that: the explanation(s) for all sets of parameters in a given region is given by the polytope vertex(face) associated to that region.

13 Parametric MAP estimation problem Local: given a choice of parameters determine the set of all parameters with the same MAP estimate. Solution: Computation of the normal cone of the Newton Polytope. Global: asks for a partition of the space of parameters such that any two parameters lie in the same part iff they yield the same MAP estimate. Solution: Computation of the normal fan of the Newton Polytope.


Download ppt "Algebraic Statistics for Computational Biology Lior Pachter and Bernd Sturmfels Ch.5: Parametric Inference R. Mihaescu Παρουσίαση: Aγγελίνα Βιδάλη Αλγεβρικοί."

Similar presentations


Ads by Google