Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment

Similar presentations


Presentation on theme: "Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment"— Presentation transcript:

1 Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment

2 Bioinformatics “Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky ( )) “Nothing in bioinformatics makes sense except in the light of Biology”

3 Divergent evolution Ancestral sequence: ABCD ACCD (B C) ABD (C ø)
ACCD or ACCD Pairwise Alignment AB─D A─BD mutation deletion

4 Divergent evolution Ancestral sequence: ABCD ACCD (B C) ABD (C ø)
ACCD or ACCD Pairwise Alignment AB─D A─BD mutation deletion true alignment

5 A bit on divergent evolution
Ancestral sequence G C A C One substitution - one visible Two substitutions - one visible Sequence 1 Sequence 2 (c) G (d) G 1: ACCTGTAATC 2: ACGTGCGATC * ** D = 3/10 (fraction different sites (nucleotides)) G A A A Back mutation - not visible Two substitutions - none visible G

6 Convergent evolution Often with shorter motifs (e.g. active sites)
Motif (function) has evolved more than once independently, e.g. starting with two very different sequences adopting different folds Sequences and associated structures remain different, but (functional) motif can become identical Classical example: serine proteinase and chymotrypsin

7 Serine proteinase (subtilisin) and chymotrypsin
Different evolutionary origins, there are Similarities in the reaction mechanisms. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues reflect different family relationships. For example the catalytic triad in the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

8 A protein sequence alignment
MSTGAVLIY--TSILIKECHAMPAGNE----- ---GGILLFHRTHELIKESHAMANDEGGSNNS * * * **** *** A DNA sequence alignment attcgttggcaaatcgcccctatccggccttaa att---tggcggatcg-cctctacgggcc---- *** **** **** ** ******

9 Searching for similarities
What is the function of the new gene? The “lazy” investigation (i.e., no biologial experiments, just bioinformatics techniques): – Find a set of similar protein sequences to the unknown sequence – Identify similarities and differences – For long proteins: identify domains

10 Evolutionary and functional relationships
Reconstruct evolutionary relation: Based on sequence -Identity (simplest method) -Similarity Homology (common ancestry: the ultimate goal) Other (e.g., 3D structure) Functional relation: Sequence Structure Function

11 Searching for similarities
Common ancestry is more interesting: Makes it more likely that genes share the same function Homology: sharing a common ancestor – a binary property (yes/no) – it’s a nice tool: When (an unknown) gene X is homologous to (a known) gene G it means that we gain a lot of information on X: what we know about G can be transferred to X as a good suggestion.

12 How to go from DNA to protein sequence
A piece of double stranded DNA: 5’ attcgttggcaaatcgcccctatccggc 3’ 3’ taagcaaccgtttagcggggataggccg 5’ DNA direction is from 5’ to 3’

13 How to go from DNA to protein sequence
6-frame translation using the codon table (last lecture): 5’ attcgttggcaaatcgcccctatccggc 3’ 3’ taagcaaccgtttagcggggataggccg 5’

14 Evolution and three-dimensional protein structure information
Isocitrate dehydrogenase: The distance from the active site (in yellow) determines the rate of evolution (red = fast evolution, blue = slow evolution) Dean, A. M. and G. B. Golding: Pacific Symposium on Bioinformatics 2000

15 Bioinformatics tool tool Algorithm Data Biological Interpretation
(model)

16 Example today: Pairwise sequence alignment needs sense of evolution
Global dynamic programming MDAGSTVILCFVG Evolution M D A S T I L C G Amino Acid Exchange Matrix Search matrix MDAGSTVILCFVG- Gap penalties (open,extension) MDAAST-ILC--GS

17 How to determine similarity
Frequent evolutionary events at the DNA level: 1. Substitution 2. Insertion, deletion 3. Duplication 4. Inversion We will restrict ourselves to these events

18 A protein sequence alignment
MSTGAVLIY--TSILIKECHAMPAGNE----- ---GGILLFHRTHELIKESHAMANDEGGSNNS * * * **** *** A DNA sequence alignment attcgttggcaaatcgcccctatccggccttaa att---tggcggatcg-cctctacgggcc---- *** **** **** ** ******

19 Dynamic programming Scoring alignments
– Substitution (or match/mismatch) • DNA • proteins – Gap penalty • Linear: gp(k)=ak • Affine: gp(k)=b+ak • Concave, e.g.: gp(k)=log(k) The score for an alignment is the sum of the scores of all alignment columns

20 Dynamic programming Scoring alignments
Sa,b = gp(k) = gapinit + kgapextension affine gap penalties

21 DNA: define a score for match/mismatch of letters
Simple: Used in genome alignments: A C G T 1 -1 A C G T 91 -114 -31 -123 100 -125

22 Dynamic programming Scoring alignments
T D W V T A L K T D W L - - I K 2020 10 1 Affine gap penalties (open, extension) Amino Acid Exchange Matrix Score: s(T,T)+s(D,D)+s(W,W)+s(V,L)-Po-2Px + +s(L,I)+s(K,K)

23 Amino acid exchange matrices
2020 How do we get one? And how do we get associated gap penalties? First systematic method to derive a.a. exchange matrices by Margaret Dayhoff et al. (1968) – Atlas of Protein Structure

24 amino acid exchange matrix (log odds)
Q E G H I L K M F P S T W Y V B Z A R N D C Q E G H I L K M F P S T W Y V B Z PAM250 matrix amino acid exchange matrix (log odds) Positive exchange values denote mutations that are more likely than randomly expected, while negative numbers correspond to avoided mutations compared to the randomly expected situation

25 Amino acid exchange matrices
Amino acids are not equal: 1. Some are easily substituted because they have similar: • physico-chemical properties • structure 2. Some mutations between amino acids occur more often due to similar codons The two above observations give us ways to define substitution matrices

26 Pair-wise alignment T D W V T A L K T D W L - - I K
Combinatorial explosion - 1 gap in 1 sequence: n+1 possibilities - 2 gaps in 1 sequence: (n+1)n - 3 gaps in 1 sequence: (n+1)n(n-1), etc. 2n (2n)! n = ~ n (n!) n 2 sequences of a.a.: ~1088 alignments 2 sequences of 1000 a.a.: ~10600 alignments!


Download ppt "Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment"

Similar presentations


Ads by Google