©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
©CMBI 2006 The WWW of clustering in Bioinformatics. or, How homo-sapiens thinks Clustering hokjes.
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
Heuristic alignment algorithms and cost matrices
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Sequence Analysis Tools
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Sequence similarity.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Algorithms Dr. Nancy Warter-Perez June 19, May 20, 2003 Developing Pairwise Sequence Alignment Algorithms2 Outline Programming workshop 2 solutions.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple sequence alignment
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Bioinformatics in Biosophy
Pairwise & Multiple sequence alignments
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
©CMBI 2003 MUTANT DESIGN BIO- INFORMATICS QUESTION ‘MOLECULAR BIOLOGY’ BIOPHYSICS.
Sequencing a genome and Basic Sequence Alignment
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequence alignment Che-Lun Hung,
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Multiple Sequence Alignment. How to score a MSA? Very commonly: Sum of Pairs = SP Compute the pairwise score of all pairs of sequences and sum them. Gap.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Sequence comparison: Local alignment
Aligning Sequences You have learned about: Data & databases Tools
Pairwise sequence Alignment.
Pairwise Alignment Global & local alignment
Introduction to Bioinformatics
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information from a well studied to a newly determined sequence, we need an alignment that represents the protein structures of today.

©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up residues at similar positions in the structure. gap = insertion ór deletion

©CMBI 2005 Global versus Local Alignment Global Local

©CMBI 2005 Global Alignment Align two sequences from “head to toe”, i.e. from 5’ ends to 3’ ends from N-termini to C-termini Algorithm published by: Needleman, S.B. and Wunsch, C.D. (1970) “A general method applicable to the search for similarities in the amino acid sequence of two proteins”. J. Mol. Biol. 48:

©CMBI 2005 Global Alignment aacttgagc- c t g a g t aacttgagc--c-tgagtaacttgagc--c-tgagt

©CMBI 2005 Local Alignment Locate region(s) with high degree of similarity in two sequences Algorithm published by: Smith, T.F. and Waterman, M.S. (1981) “Identification of common molecular subsequences”. J. Mol. Biol. 147:

©CMBI 2005 Local Alignment aacttgagc-c t g a g t aacttgagc-c t g a g t cttgagct-gagcttgagct-gag

©CMBI 2005 Gap Penalty Functions Linear Penalty rises monotonous with length of gap Affine Penalty has a gap-opening and a separate length component Probabilistic Penalties may depend upon the character of the residues involved Other functions Penalty first rises fast, but levels off at greater length values

©CMBI 2005 Significance of Alignment How significant is the alignment that we have found? Or put differently: how much different is the alignment score that we found from scores obtained by aligning random sequences to our sequence?

©CMBI 2005 Calculating Significance Repeat N times (N > 100): Randomise sequence A by shuffling the residues in a random fashion Align randomized sequence A with sequence B, and calculate alignment score S Calculate mean and standard deviation Calculate Z-score: Z = (S genuine – Ŝ random ) / s.d.

©CMBI 2005 Significance of Alignment Random matches Genuine match Alignment score

©CMBI 2005 Significance of Alignment Random matches Random match Alignment score

©CMBI 2001 The amino acids Most information that enters the alignment procedure comes from the physicochemical properties of the amino acids. Example: which is the better alignment (left or right)? CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW

©CMBI 2001 A difficult alignment problem AYAYAYAYSY LGLPLPLPLP So, in an alignment of more than 2 sequences you can find more information than from just the 2 sequences you are interested in. How do we make these multi- sequence alignmnets? AGAPAPAPSP

©CMBI 2001 A difficult alignment problem solved AYAYAYAYSY AGAPAPAPSP LGLPLPLPLP

©CMBI 2001 Alignment order MIESAYTDSW QFEKSYVTDY -MIESAYTDSW QFEKSYVTDY-

©CMBI 2001 Alignment order MIESAYTDSW QFEKSYVTDY QWERTYASNF -MIESAYTDSW QFEKSYVTDY- QWERTYASNF-

©CMBI 2001 Conclusion Align first the sequences that look very much like each other. So you ‘build up information’ while generating those alignments that most likely are correct.

©CMBI 2001 Alignment order In order to know which sequences look most like each other, you need to do all pairwise alignments first. This is exactly what CLUSTAL does. CLUSTAL builds a tree while doing the build-up of the multiple sequence alignment.

©CMBI 2001 MSA and trees Take, for example, the three sequences: 1 ASWTFGHK 2 GTWSFANR 3 ATWAFADR and you see immediately that 2 and 3 are close, while 1 is further away. So the tree will look roughly like: 3 2 1

©CMBI 2001 Aligning sequences; start with distances D E Matrix of pair-wise distances between five sequences D and E are the closest pair. Take them, and collapse the matrix by one row/column.

©CMBI 2001 Aligning sequences D E A B

©CMBI 2001 Aligning sequences D E C A B

©CMBI 2001 Aligning sequences D E C A B

©CMBI 2001 The problem is actually bigger 1 ASWTFGHK 2 GTWSFANR 3 ATWAFADR d(i,j) is the distance between sequences i and j. d(1,2)=6; d(1,3)=5; d(2,3)= So a perfect representation would be: But what if a 4th sequence is added with d(1,4)=4, d(2,4)=5, d(3,4)=4? Where would that sequence sit?

©CMBI 2001 So, nice tree, but what did we actually do? 1)We determined a distance measure 2)We measured all pair-wise distances 3)We reduced the dimensionality of the space of the problem 4)We used an algorithm to visualize In a way, we projected the hyperspace in which we can perfectly describe all pair-wise distances onto a 1-dimensional line. What does this sentence mean?

©CMBI 2001 Back to sequences: In we have N sequences, we can only draw their distance matrix in an N-1 dimensional space. By the time it is a tree, how many dimensions, and how much information have we lost? Perhaps we should cluster in a different way?

©CMBI 2001 Other algorithms Multi-sequence alignment can also be done with an iterative ‘profile’ alignment. A) Make an alignment of few, well-aligned sequences B) Align all sequences using this profile

©CMBI What is a profile? Normally, we use a PAM-like matrix to determine the score for each possible match in an alignment. This assumes that all matches between I E are the same. But the aren’t.

©CMBI What is a profile? QWERTYIPASEF At 1, E and I are QWEKSFIPGSEY both OK. NWERTMVPVSEM QFEKTYLPSSEY At 2, I is OK, NFIKTLMPATEF but E surely not. QYIRSLIPAGEM NYIQSLIPSTEL At 3, E is OK, QFIRSLFPSSEI but I surely not

©CMBI What is a profile? The knowledge about which residue types are good at a certain position in the multiple sequence alignment can be expressed in a profile. A profile holds for each position 20 scores for the 20 residue types, and sometimes also two values for position specific gap open and gap elongation penalties.

©CMBI 2001 Conserved, variable, or in-between QWERTYASDFGRGH QWERTYASDTHRPM QWERTNMKDFGRKC QWERTNMKDTHRVW Gray = conserved Black = variable Green = correlated mutations

©CMBI 2001 Correlated mutations determine the tree shape 1 AGASDFDFGHKM 2 AGASDFDFRRRL 3 AGLPDFMNGHSI 4 AGLPDFMNRRRV