Presented by Liu Qi Pairwise Sequence Alignment. Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes:

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Global Sequence Alignment by Dynamic Programming.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Pairwise Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
1 Introduction to Bioinformatics 2 Introduction to Bioinformatics. LECTURE 3: SEQUENCE ALIGNMENT * Chapter 3: All in the family.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Arun Goja MITCON BIOPHARMA
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
INTRODUCTION TO BIOINFORMATICS
Sequence comparison: Dynamic programming
Sequence comparison: Local alignment
Sequence comparison: Traceback and local alignment
Pairwise sequence Alignment.
Pairwise Sequence Alignment
Sequence comparison: Local alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Pairwise Alignment Global & local alignment
Basic Local Alignment Search Tool (BLAST)
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Presented by Liu Qi Pairwise Sequence Alignment

Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes: conservation of sequence conservation of function BUT: Function carried out at level of proteins, i.e. 3-D structure Sequence conservation carried out at level of DNA 1-D sequence

Presented By Liu Qi

Relation of sequences Homologous sequences. Orthologs and Paralogs are two types of homologous sequences. Orthology describes genes in different species that derive from a common ancestor. Orthologous genes may or may not have the same function. Paralogy describes homologous genes within a single species that diverged by gene duplication. Homologous sequences. Orthologs and Paralogs are two types of homologous sequences. Orthology describes genes in different species that derive from a common ancestor. Orthologous genes may or may not have the same function. Paralogy describes homologous genes within a single species that diverged by gene duplication.

Some Definitions An alignment is a mutual arrangement of two sequences, which exhibits where the two sequences are similar, and where they differ. An alignment is a mutual arrangement of two sequences, which exhibits where the two sequences are similar, and where they differ. An optimal alignment is one that exhibits the most correspondences and the least differences. It is the alignment with the highest score. May or may not be biologically meaningful. An optimal alignment is one that exhibits the most correspondences and the least differences. It is the alignment with the highest score. May or may not be biologically meaningful. Presented By Liu Qi

Methods Dot matrix Dynamic Programming Word, k-tuple (heuristic based)

Presented By Liu Qi Brief intro of methods dot matrix - all possible matches between sequence residues are found; used to compare two sequences to look for regions where they may align; very useful for finding indels and repeats in sequences; can be used as a first pass to see if there is any similarity between sequences dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences; very computationally expensive - # of steps increases exponentially with sequence length k-tuple (word) methods - used by FASTA and BLAST (previously described); much faster than dynamic programming and ideal for database searches; uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

Presented By Liu Qi Dot matrix 1 - one sequence listed along top of page and second sequence listed along the side 2 - move across row and put dot in any column where the character is the same 3 - continue for each row until all possible character matches between the sequences are represented by dots 4 - diagonal rows of dots reveal sequence similarity (can also find repeats and inverted repeats off the main diagonal) 5 - isolated dots represent random similarity unrelated to the alignment

Presented By Liu Qi

Dot matrix with noise reduction

Presented By Liu Qi Dot matrix To improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences We compare a number of positions (window size), and we write down a dot whenever there is minimum number (stringency) of identical characters

Presented By Liu Qi Dot matrix Caution is necessary regarding the window size and the stringency value. Generally, they assume different values for different problems. The optimal values will accent the regions of similarity of the two sequences For DNA sequence usually, Sliding window=15, stringency=10 For Protein sequence Sliding window=2 or 3, stringency=2

Presented By Liu Qi Things to be considered Scoring matrix for distance correction. Window size Threshold

Presented By Liu Qi The useful of Dot plot Regions of similarity: diagonals Regions of similarity: diagonals Insertions/deletions: gaps Insertions/deletions: gaps Can determine intron/exon structure Can determine intron/exon structure Repeats: parallel diagonals Repeats: parallel diagonals Inverted repeats: perpendicular diagonals Inverted repeats: perpendicular diagonals Inverted repeats Inverted repeats Can be used to determine regions of base pairing of RNA molecules Can be used to determine regions of base pairing of RNA molecules

Presented By Liu Qi Intra-sequence comparison Repeats Inverted repeats Low complexity

Presented By Liu Qi ABRACADABRACAD ABRACADABRACAD Examples

Presented By Liu Qi palindrome Sequence: ATOYOTA

Presented By Liu Qi Repeats Drosophila melanogaster SLIT protein against itself

Presented By Liu Qi Low complexity

Presented By Liu Qi Inter sequence comparison Conserved domains Conserved domains Insertion and deletion Insertion and deletion

Presented By Liu Qi Insertion and deletion Seq1:DOROTHYCROWFOOTHODGKIN Seq1:DOROTHYCROWFOOTHODGKIN Seq2:DOROTHYHODGKIN Seq2:DOROTHYHODGKIN

Presented By Liu Qi Conserved domains

Presented By Liu Qi Translated DNA and protein comparison :Exons and introns

Presented By Liu Qi

Even more can be done with RNA RNA comparisons of the reverse, complement of a sequence to itself can often be very informative. RNA comparisons of the reverse, complement of a sequence to itself can often be very informative. Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from Baker’s yeast.Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from Baker’s yeast. The sequence and structure of this molecule is also known; the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural insights (even without complex folding algorithms).The sequence and structure of this molecule is also known; the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural insights (even without complex folding algorithms).

Presented By Liu Qi Structures of tRNA-Phe

Presented By Liu Qi RNA comparisons of the reverse, complement of a sequence to itself

Presented By Liu Qi Programs for Dot Matrix Dotlet Dotlet SIGNAL SIGNAL s_inf_sig.html s_inf_sig.html Dotter otter.html Dotter otter.html COMPARE, DOTPLOT in GCG COMPARE, DOTPLOT in GCG

Presented By Liu Qi conclusion Advantages: Readily reveals the presence of insertions/deletions and direct and inverted repeats that are more difficult to find by the other, more automated methods. Advantages: Readily reveals the presence of insertions/deletions and direct and inverted repeats that are more difficult to find by the other, more automated methods. let’s your eyes/brain do the work –VERY EFFICIENT!!!! Disadvantages: Most dot matrix computer programs do not show an actual alignment. Does not return a score to indicate how ‘optimal’ a given alignment is. Disadvantages: Most dot matrix computer programs do not show an actual alignment. Does not return a score to indicate how ‘optimal’ a given alignment is.

Presented By Liu Qi Reference Gibbs, A. J. & McIntyre, G. A. (1970). The diagram method for comparing sequences. its The diagram method for comparing sequences. its use with amino acid and nucleotide sequences.Eur. J. Biochem. 16, Gibbs, A. J. & McIntyre, G. A. (1970). The diagram method for comparing sequences. its The diagram method for comparing sequences. its use with amino acid and nucleotide sequences.Eur. J. Biochem. 16, Maizel, J.V., Jr. and Lenk R.P. (1981). nhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl. Acad. Sci. 78: Maizel, J.V., Jr. and Lenk R.P. (1981). nhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl. Acad. Sci. 78: Staden, R. (1982). An interactive graphics program for comparing and aligning nucleic-acid and amino-acid acid sequences. Nucl. Acid. Res. 10 (9), Staden, R. (1982). An interactive graphics program for comparing and aligning nucleic-acid and amino-acid acid sequences. Nucl. Acid. Res. 10 (9),

Presented By Liu Qi Dynamic Programming Answer: what is the optimal alignment of two sequences(the best score)? How many different alignments?

Alignment methods with DP Global alignment - Needleman-Wunsch (1970) maximizes the number of matches between the sequences along the entire length of the sequences. Global alignment - Needleman-Wunsch (1970) maximizes the number of matches between the sequences along the entire length of the sequences. Local alignment - Smith-Waterman (1981) is a modification of the dynamic programming algorithm giving the highest scoring local match between two sequences Local alignment - Smith-Waterman (1981) is a modification of the dynamic programming algorithm giving the highest scoring local match between two sequences Presented By Liu Qi

Dynamic Programming A simple example A simple example A B C D E F 8 7 9

Presented By Liu Qi Exercise

Presented By Liu Qi 动态规划的适用条件 一个最优化策略的子策略总是最优的。 一个最优化策略的子策略总是最优的。 无后向性 无后向性 以前各阶段的状态无法直接影响它未来的决策 以前各阶段的状态无法直接影响它未来的决策 空间换时间(子问题的重叠性) 空间换时间(子问题的重叠性)

Presented By Liu Qi Dynamic Programming

Presented By Liu Qi Dynamic Programming

Presented By Liu Qi Dynamic Programming

Presented By Liu Qi Dynamic Programming

Presented By Liu Qi DP Algorithm for Global Alignment Two sequences X = x 1...x n and Y = y 1...y m F(i, j) be the optimal alignment score of X 1...i and Y 1...j (0 ≤ i ≤ n, 0 ≤ j ≤ m).

Presented By Liu Qi DP in equation form

Presented By Liu Qi A simple example ACGT A C G T AAGA G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

Presented By Liu Qi A simple example ACGT A C G T AAG0 A G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

Presented By Liu Qi A simple example ACGT A C G T AAG A-5 G-10 C-15 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

Presented By Liu Qi A simple example ACGT A C G T AAG A G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

Presented By Liu Qi Traceback Start from the lower right corner and trace back to the upper left. Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A diagonal move uses one character from each sequence.

Presented By Liu Qi Start from the lower right corner and trace back to the upper left. Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A diagonal move uses one character from each sequence. A simple example AAG 0-5 A2-3 G C-6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

Presented By Liu Qi Start from the lower right corner and trace back to the upper left. Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A diagonal move uses one character from each sequence. A simple example AAG 0-5 A2-3 G C-6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. AAG- -AGC A-GC

Presented By Liu Qi Exercise Find Global alignment Find Global alignment X=catgt X=catgt Y=acgctg Y=acgctg Score: d=-1 mismatch=-1 match=2 Score: d=-1 mismatch=-1 match=2

Presented By Liu Qi Answer

Local alignment A single-domain protein may be homologous to a region within a multi-domain protein. A single-domain protein may be homologous to a region within a multi-domain protein. Usually, an alignment that spans the complete length of both sequences is not required. Usually, an alignment that spans the complete length of both sequences is not required.

Presented By Liu Qi Local alignment DP Align sequence x and y. Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

Presented By Liu Qi Local DP in equation form 0

Presented By Liu Qi Local alignment Two differences with respect to global alignment: Two differences with respect to global alignment: No score is negative. No score is negative. Traceback begins at the highest score in the matrix and continues until you reach 0. Traceback begins at the highest score in the matrix and continues until you reach 0. Global alignment algorithm: Needleman- Wunsch. Global alignment algorithm: Needleman- Wunsch. Local alignment algorithm: Smith- Waterman. Local alignment algorithm: Smith- Waterman.

Presented By Liu Qi A simple example ACGT A C G T AAGA G C Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

Presented By Liu Qi A simple example ACGT A C G T AAG0000 A0 G0 C0 Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

Presented By Liu Qi A simple example ACGT A C G T AAG0000 A0220 G0004 C0000 Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

Presented By Liu Qi A simple example ACGT A C G T AAG0000 A0220 G0004 C0000 Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0 AG

Presented By Liu Qi Local alignment ACGT A C G T AAG0000 G0 A0 A0 G0 G0 C0 Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. 0

Presented By Liu Qi Local alignment ACGT A C G T AAG0000 G0002 A0220 A0240 G0006 G0002 C0000 Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. 0

Presented By Liu Qi End-Space Free Alignment any number of indel operations at the end or at the beginning of the alignment contribute zero weight. X= - - c a c - t g t a c Y= g a c a c t t g - - -

Presented By Liu Qi End-Space Free Alignment. Base conditions: ∀ i, j. F (i, 0) = 0, F(0, j) = 0 Recurrence relation: F (i, j) = max F(i -1, j - 1) + s(X i, Y j ) F(i -1, j) + d F (i,j - 1) + d Search for i* such that: F (i*.,m) = max 1≤i≤n F (i, m) Search for j* such that: F(n, j*) =max 1≤j≤m F (n, j) Define alignment score: F(n, m) =max{ F(n, j*),F (i*,m)}

Presented By Liu Qi Exercise Align two sequence ( match=1,mismatch=-1,gap=-1) Align two sequence ( match=1,mismatch=-1,gap=-1) X = c a c t g t a c Y= g a c a c t t g

Presented By Liu Qi 思考题 Does a local alignment program always produce a local alignment and a global alignment program always produce a global alignment? Does a local alignment program always produce a local alignment and a global alignment program always produce a global alignment? Develop an algorithm to find the longest common subsequence (LCS) of two given sequences. Develop an algorithm to find the longest common subsequence (LCS) of two given sequences.

Presented By Liu Qi Affine gap penalty LETVGY W----L Separate penalties for gap opening and gap extension. Separate penalties for gap opening and gap extension. This requires modifying the DP algorithm This requires modifying the DP algorithm

Presented By Liu Qi Affine gap penalty a gap of length k is more probable than k gaps of length 1 – a gap may be due to a single mutational event that inserted/deleted a stretch of characters – separated gaps are probably due to distinct mutational events a linear gap penalty function treats these cases the same it is more common to use gap penalty functions involving two terms – a penalty h associated with opening a gap – a smaller penalty g for extending the gap

Presented By Liu Qi Gap penalty functions

Presented By Liu Qi Dynamic Programming for the Affine Gap Penalty Case need 3 matrices instead of 1

Presented By Liu Qi Dynamic Programming for the Affine Gap Penalty Case

Presented By Liu Qi

match=1, mismatch=-1

Presented By Liu Qi

Exercise Write the formula for “Local Alignment DP for the Affine Gap Penalty Case”

Presented By Liu Qi

Word, k-tup FASTA FASTA BLAST BLAST