Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Sequence Alignment (I)

Similar presentations


Presentation on theme: "Multiple Sequence Alignment (I)"— Presentation transcript:

1 Multiple Sequence Alignment (I)
(Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct. 4, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

2 Outline Motivation Scoring of multiple sequence alignments Algorithms
Dynamic programming Progressive alignment (next class)

3 Why Multiple Alignments?
Characterize protein families: Identify shared regions of homology in a multiple sequence alignment Determination of the consensus sequence of several aligned sequences. Help predict the secondary and tertiary structures of new sequences Help predict the function of new sequences Preliminary step in molecular evolution analysis using phylogenetic trees.

4 Example of Multiple Alignment
The selected region is highly conserved with a generic globin. Multiple sequence alignment of 7 neuroglobins using clustalx (Slide from Craig A. Struble)

5 4 Basic Questions in Multiple Alignment
Q1: How should we define s? Q2: How should we define A? Model: scoring function s: A X1=x11,…,x1m1 X1=x11,…,x1m1 Possible alignments of all Xi’s: A ={a1,…,ak} Find the best alignment(s) X2=x21,…,x2m2 X2=x21,…,x2m2 S(a*)= 21 XN=xN1,…,xNmN XN=xN1,…,xNmN Q4: Is the alignment biologically Meaningful? Q3: How can we find a* quickly?

6 Defining Multi-Sequence Alignment
We may generalize our definition of pairwise sequence alignment Alignment of 2 sequences is represented as a 2-row matrix In a similar way, we represent alignment of 3 sequences as a 3-row matrix A T _ G C G _ A _ C G T _ A A T C A C _ A A column must have at least one nucleotide Question: How many possible global alignments are there for 3 sequences each of length 2?

7 How do we score a multiple alignment?

8 Scoring a Multiple Alignment
Ideally, it should be based on evolutionary models In practice, We often assume columns are independent Use “Sum of Pairs” (SP scores) G is the gap score

9 Minimum Entropy Scoring
Intuition: A perfectly aligned column has one single symbol (least uncertainty) A poorly aligned column has many distinct symbols (high uncertainty) Count of symbol a in column i This is related to the HMM formulation of the alignment problem, which we will cover later …

10 Entropy: Example Best case Worst case

11 Entropy of an Alignment: Example
column entropy: -( pAlogpA + pClogpC + pGlogpG + pTlogpT) A C G T Column 1 = -[1*log(1) + 0*log0 + 0*log0 +0*log0] = 0 Column 2 = -[(1/4)*log(1/4) + (3/4)*log(3/4) + 0*log0 + 0*log0] = -[ (1/4)*(-2) + (3/4)*(-.415) ] = Column 3 = -[(1/4)*log(1/4)+(1/4)*log(1/4)+(1/4)*log(1/4) +(1/4)*log(1/4)] = 4* -[(1/4)*(-2)] = +2 Alignment Entropy = =

12 How can we find a multiple alignment quickly?
Can we generalize the dynamic programming algorithm used for pairwise alignment?

13 Alignments = Paths in… Align 3 sequences: ATGC, AATC,ATGC A -- T G C A

14 Alignment Paths 1 2 3 4 x coordinate A -- T G C A T -- C -- A T G C

15 Alignment Paths Align the following 3 sequences: ATGC, AATC,ATGC
1 2 3 4 x coordinate A -- T G C y coordinate 1 2 3 4 A T -- C -- A T G C

16 Alignment Paths Resulting path in (x,y,z) space:
1 2 3 4 x coordinate A -- T G C y coordinate 1 2 3 4 A T -- C 1 2 3 4 z coordinate -- A T G C Resulting path in (x,y,z) space: (0,0,0)(1,1,0)(1,2,1) (2,3,2) (3,3,3) (4,4,4)

17 2-D vs 3-D Alignment Grid V W 2-D edit graph 3-D?

18 Architecture of 3-D Alignment Grid
In 2-D, 3 edges in each unit square In 3-D, 7 edges in each unit cube

19 A Cell of 3-D Alignment Grid
(i-1,j,k-1) (i-1,j-1,k-1) (i-1,j-1,k) (i-1,j,k) (i,j,k-1) (i,j-1,k-1) (i,j,k) (i,j-1,k)

20 Multiple Alignment: Dynamic Programming
cube diagonal: no indels si,j,k = max (x, y, z) is an entry in the 3-D scoring matrix and can be computed using sum of pairs or entropy si-1,j-1,k-1 + (vi, wj, uk) si-1,j-1,k + (vi, wj, _ ) si-1,j,k (vi, _, uk) si,j-1,k (_, wj, uk) si-1,j,k + (vi, _ , _) si,j-1,k + (_, wj, _) si,j,k (_, _, uk) face diagonal: one indel edge diagonal: two indels

21 Multiple Alignment: Running Time
For 3 sequences of length n, the run time is 7n3; O(n3) For k sequences, building a k-dimensional edit graph has run time (2k-1)(nk); O(2knk) Conclusion: dynamic programming approach for alignment between two sequences is easily extended to k sequences but it is impractical due to exponential running time

22 In the next class, we will cover more efficient algorithms -- progressive alignment ….

23 What You Should Know How to score a multi-sequence alignment
How the dynamic programming algorithm works Computational complexity of dynamic programming algorithms


Download ppt "Multiple Sequence Alignment (I)"

Similar presentations


Ads by Google