Download presentation
Presentation is loading. Please wait.
Published byAda Griffith Modified over 9 years ago
2
1 Chapter 2 Data Searches and Pairwise Alignments 暨南大學資訊工程學系 黃光璿 2004/03/08
3
2 Introduction What is the difference between acctga and agcta? a c c t g a a g c t g a a g c t - a
4
3 Nomenclature
5
4 2.1 Dot Plots
6
5 2.2 Simple Alignments No gap
7
6 mutation (substitution): common insertion deletion scoring scheme match score mismatch score } gap, indel (rare)
8
7 2.3 Gaps
9
8 2.3.1 Gap Penalty uniform gap affine gap origination penalty length penalty
10
9 2.4 Scoring Matrices
11
10 Modeling 之問題 大自然是否真的依此規則運作?
12
11 Modeling
13
12
14
13 Define the odds ratio as
15
14 2.4.1 PAM Matrices Dayhoff, Schwartz, Orcutt (1978) Point Accepted Mutation Based on observed substitution rates (Box. 2.1) Input A set of observed substitution rates Output PAM-1 matrix (log-odds matrix)
16
15 Multiple Alignment (1) Group the sequences with high similarity (> 85% identity).
17
16 Phylogenetic Tree (2) For each group, build the corresponding phylogenetic tree.
18
17 Mutation Frequency A->G, I->L, A->G, A->L, C->S, G->A (3) F G,A =3
19
18 Relative Mutability (4)
20
19 Mutation Probability (5)
21
20 Odds Ratio (6)
22
21 Log-Odds Ratio (7)
23
22 Which PAM matrix is the most appropriate? the length of the sequences How closely the sequences are believed to be related. PAM 120 for database search PAM 200 for comparing two specific proteins
24
23 2.4.2 BLOSUM Matrices Henikoff & Henikoff (1992) PAM-k: k 愈大, 愈不相似 BLOSUM-k: k 愈大愈相似 BLOSUM62: for ungapped matching BLOSUM50: for gapped matching
25
24 2.5 Dynamic Programming The Needleman and Wunsch Algorithm (Global Alignment)
26
25
27
26 Alignment Graph
28
27
29
28 A C - - T C G A C A G T A G
30
29 Complexity
31
30 2.6 Global and Local Alignments Semi-global alignment Local alignment
32
31 2.6.1 Semi-global Alignments A A C A C G T G T C T - - - A C G T - - - -
33
32
34
33 2.6.2 Local Alignment The Smith-Waterman Alignment
35
34
36
35 2.7 Database Searches BLAST and its relatives FASTA and related algorithms
37
36 2.7.1 BLAST and Its Relatives ProgramDatabaseQuery BLASTNNucleotide BLASTPProtein BLASTXProteinNucleotide Protein TBLASTNNucleotide Protein Protein TBLASTXNucleotide Protein
38
37 BLASTP Using PAM or BLOSUM matrices
39
38 2.7.2 FASTA and Related Algorithms 改進 dot plot & band search 1. Preprocess the target sequence. Identify the position for each word. (for amino acid & word length=1, a 20-entry array) 2. Scan the query sequence. Compute the shifts of query to align each word with the target. 3. Find the mode ( 眾數 ) of the shifts. 4. Join the possible shifts into one new target sequence. Perform the full local alignment algorithm.
40
39 Target: FAMLGFIKYLPGCM Query:TGFIKYLPGACT
41
40 2.7.3 Alignment Scores and Statistical Significance of Database Searches related model v.s. random model S-score: the alignment score E-score: expected number of sequences with score >= S by random chance P-score: probability that one or more sequences with score >= S would be found randomly Low E & P are better.
42
41 length correction Scores
43
42 PAM 120 ( ln 2)/2 nats A R N D C Q E G H I L K M F P S T W Y V B Z X * A 3 -3 -1 0 -3 -1 0 1 -3 -1 -3 -2 -2 -4 1 1 1 -7 -4 0 0 -1 -1 -8 R -3 6 -1 -3 -4 1 -3 -4 1 -2 -4 2 -1 -5 -1 -1 -2 1 -5 -3 -2 -1 -2 -8 N -1 -1 4 2 -5 0 1 0 2 -2 -4 1 -3 -4 -2 1 0 -4 -2 -3 3 0 -1 -8 D 0 -3 2 5 -7 1 3 0 0 -3 -5 -1 -4 -7 -3 0 -1 -8 -5 -3 4 3 -2 -8 C -3 -4 -5 -7 9 -7 -7 -4 -4 -3 -7 -7 -6 -6 -4 0 -3 -8 -1 -3 -6 -7 -4 -8 Q -1 1 0 1 -7 6 2 -3 3 -3 -2 0 -1 -6 0 -2 -2 -6 -5 -3 0 4 -1 -8 E 0 -3 1 3 -7 2 5 -1 -1 -3 -4 -1 -3 -7 -2 -1 -2 -8 -5 -3 3 4 -1 -8 G 1 -4 0 0 -4 -3 -1 5 -4 -4 -5 -3 -4 -5 -2 1 -1 -8 -6 -2 0 -2 -2 -8 H -3 1 2 0 -4 3 -1 -4 7 -4 -3 -2 -4 -3 -1 -2 -3 -3 -1 -3 1 1 -2 -8 I -1 -2 -2 -3 -3 -3 -3 -4 -4 6 1 -3 1 0 -3 -2 0 -6 -2 3 -3 -3 -1 -8 L -3 -4 -4 -5 -7 -2 -4 -5 -3 1 5 -4 3 0 -3 -4 -3 -3 -2 1 -4 -3 -2 -8 K -2 2 1 -1 -7 0 -1 -3 -2 -3 -4 5 0 -7 -2 -1 -1 -5 -5 -4 0 -1 -2 -8 M -2 -1 -3 -4 -6 -1 -3 -4 -4 1 3 0 8 -1 -3 -2 -1 -6 -4 1 -4 -2 -2 -8 F -4 -5 -4 -7 -6 -6 -7 -5 -3 0 0 -7 -1 8 -5 -3 -4 -1 4 -3 -5 -6 -3 -8 P 1 -1 -2 -3 -4 0 -2 -2 -1 -3 -3 -2 -3 -5 6 1 -1 -7 -6 -2 -2 -1 -2 -8 S 1 -1 1 0 0 -2 -1 1 -2 -2 -4 -1 -2 -3 1 3 2 -2 -3 -2 0 -1 -1 -8 T 1 -2 0 -1 -3 -2 -2 -1 -3 0 -3 -1 -1 -4 -1 2 4 -6 -3 0 0 -2 -1 -8 W -7 1 -4 -8 -8 -6 -8 -8 -3 -6 -3 -5 -6 -1 -7 -2 -6 12 -2 -8 -6 -7 -5 -8 Y -4 -5 -2 -5 -1 -5 -5 -6 -1 -2 -2 -5 -4 4 -6 -3 -3 -2 8 -3 -3 -5 -3 -8 V 0 -3 -3 -3 -3 -3 -3 -2 -3 3 1 -4 1 -3 -2 -2 0 -8 -3 5 -3 -3 -1 -8 B 0 -2 3 4 -6 0 3 0 1 -3 -4 0 -4 -5 -2 0 0 -6 -3 -3 4 2 -1 -8 Z -1 -1 0 3 -7 4 4 -2 1 -3 -3 -1 -2 -6 -1 -1 -2 -7 -5 -3 2 4 -1 -8 X -1 -2 -1 -2 -4 -1 -1 -2 -2 -1 -2 -2 -2 -3 -2 -1 -1 -5 -3 -1 -1 -1 -2 -8 * -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8
44
43 Applications Reconstructing long sequences of DNA from overlapping sequence fragments Determining physical and genetic maps from probe data under various experiment protocols Database searching Comparing two or more sequences for similarities
45
44 Protein structure prediction (building profiles) Comparing the same gene sequenced by two different labs
46
45 2.8 Multiple Sequence Alignemnts CLUSTAL R. G. Higgins & P. M. Sharp, 1988 CLUSTALW Sequences are weighted according to how divergent they are from the most closely related pair of sequences. Gaps are weighted for different sequences.
47
46 Summary notion of similarity the scoring system used to rank alignments the algorithms used to find optimal scoring alignment the statistical method used to evaluate the significance of an alignment score
48
47 參考資料及圖片出處 1. Fundamental Concepts of Bioinformatics Dan E. Krane and Michael L. Raymer, Benjamin/Cummings, 2003. Fundamental Concepts of Bioinformatics 2. BLAST, by I. Korf, M. Yandell, J. Bedell, O‘Reilly & Associates, 2003. (天瓏代理) BLAST天瓏代理 3. Biological Sequence Analysis – Probabilistic Models of Proteins and Nucleic Acids R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge University Press, 1998. Biological Sequence Analysis 4. Biochemistry, by J. M. Berg, J. L. Tymoczko, and L. Stryer, Fith Edition, 2001. Biochemistry
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.