Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.

Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary cost matrix. This is also called the sum-of-pairs optimization. Given two sequences of length m and n we can use the Smith Waterman algorithm to find the optimal alignment in O(mn) time and space (using a dynamic programming algorithm).

Expected accuracy of alignment The dynamic programming formulation allows us to find the optimal alignment defined by a scoring matrix and gap penalties. We now look at a different formulation of alignment that allows us to compute the most accurate one instead of the optimal one.

Posterior probability of x i aligned to y j Let A be the set of all alignments of sequences x and y, and define P(a|x,y) to be the probability that alignment a (of x and y) is the true alignment a*. We define the posterior probability of the i th residue of x (x i ) aligning to the j th residue of y (y j ) in the true alignment (a*) of x and y as Do et. al., Genome Research, 2005

Expected accuracy of alignment We can define the expected accuracy of an alignment a as The maximum expected accuracy alignment can be obtained by the same dynamic programming algorithm Do et. al., Genome Research, 2005

Example for expected accuracy True alignment AC_CG ACCCA Expected accuracy=(1+1+0+1+1)/4=1 Estimated alignment ACC_G ACCCA Expected accuracy=(1+1+0.1+0+1) ~ 0.75

Estimating posterior probabilities If correct posterior probabilities can be computed then we can compute the correct alignment. Now it remains to estimate these probabilities from the data Probcons: estimate probabilities from pairwise HMM using forward and backward recursions Probalign: use partition function posterior probabilities

Estimating posterior probabilities We are interested in estimating posterior probabilities for two sequences x and y. By generating an ensemble A(n,x,y) of n alignments of x and y we can estimate P(x i ~y j ) by counting the number of times x i is aligned to y j.. Note that this means we are assigning equal weights to all alignments in the ensemble.

Generating ensemble of alignments We use stochastic backtracking to generate a given number of optimal and suboptimal alignments. At every step in the traceback we assign a probability to each of the three possible positions.

Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.

Similar presentations

Presentation on theme: "Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.

Similar presentations

Presentation on theme: "Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary."— Presentation transcript:

Similar presentations

About project

Feedback