1 Multiple Sequence Alignment 暨南大學資訊工程學系 黃光璿 2004/05/31
2 What is a multiple alignment?
3
4 An alignment of ten I-set immunoglobin superfamily
5 Motivation A multiple alignment may suggest a common structure of the protein products; a common function; a common evolutionary source.
6 Issues How to define meaningful scoring function for an alignment? evolutionary correct alignment --- more difficult! structure alignment How to find the best alignment? by algorithms
7 Three types of alignment problems DNA protein joined by disulfide bond RNA more difficult due to long-range correlation We focus on alignment problems of sequences of DNAs or proteins.
8
9
10
11
12 To prove that a computational problem is NP-hard, we need to reduce an NP-complete (hard) problem to this problem.
13 When a computational problem is NP- hard, we deal with it by heuristic: convince other people by experiments approximation: how to analyze the performance? randomization: how to design a reasonable algorithm
14
15
16
17
18
19
20 Branch & bound heuristic for the DP algorithm of the Sum-of-pairs Carrillo & Lipman (1988) The idea was implemented in the famous problem MSA. Lipman, Altshul, Kececiogly, 1989 MSA can align 6 sequences of length ~200 in reasonable time.
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35 參考資料及圖片出處 1. Biological Sequence Analysis – Probabilistic Models of Proteins and Nucleic Acids R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge University Press, Biological Sequence Analysis